Wednesday, 7 August 2013

Join results in more than 2^31 rows (internal vecseq reached physical limit)

Join results in more than 2^31 rows (internal vecseq reached physical limit)

I just tried merging two tables in R 3.0.1 on a machine with 64G ram and
got the following error. Help would be appreciated. (the data.table
version is 1.8.8)
Here is what my code looks like: library(parallel) library(data.table)
data1: 24m observations, 3 variables (tag, prod, v). 750k unique values of
tag, anything from 1 to 1000 prods per tag, 5000 possible values for prod.
v takes any positive real value.
setkey(data1,tag) merge (data1, data1, allow.cartesian=T)
I get the following error: Error in vecseq(f_, len_, if (allow.cartesian)
NULL else as.integer(max(nrow(x), : Join results in more than 2^31 rows
(internal vecseq reached physical limit). Very likely misspecified join.
Check for duplicate key values in i, each of which join to the same group
in x over and over again. If that's ok, try including j and dropping by
(by-without-by) so that j runs for each group to avoid the large
allocation. Otherwise, please search for this error message in the FAQ,
Wiki, Stack Overflow and datatable-help for advice. Calls: merge ->
merge.data.table -> [ -> [.data.table -> vecseq
This merge is exactly what I want.

No comments:

Post a Comment