问题
I have two data.table dx
and dy
dx <- data.table(a = c(1,1,1,1,2,2), b = 3:8)
dy <- data.table(a = c(1,1,2), c = 7:9)
I want to join dy
to each row of dx
, and below is the desired output
data.table(plyr::ddply(dx, c("a", "b"), function(d) merge(d, dy, by = "a")))
a b c
1: 1 3 7
2: 1 3 8
3: 1 4 7
4: 1 4 8
5: 1 5 7
6: 1 5 8
7: 1 6 7
8: 1 6 8
9: 2 7 9
10: 2 8 9
However, I failed to make the output only using operation inside []
of data.table
or merge
? I have tired
merge(dx, dy, by = "a", all = TRUE)
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 10 rows; more than 9 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
dy[dx,on="a"]
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, : Join results in 10 rows; more than 9 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
dx[, merge(dy, by = "a"), by = c("a", "b")]
Error in is.data.table(y) : argument "y" is missing, with no default
dx[, merge(.SD, dy, by = "a"), by = c("a", "b")]
Error in merge.data.table(.SD, dy, by = "a") :
Elements listed in by
must be valid column names in x and y
How can I do it actually?
Thanks!
回答1:
If, I understood your requirement correctly, There is a direct merge option that you can use,
dx <- data.table(a = c(1,1,2,2), b = 3:6)
dy <- data.table(a = c(1,1,2), c = 7:9)
merge(x = dx, y = dy, by = "a", all = TRUE)
It gives your desired output that you mentioned. How to join (merge) data frames (inner, outer, left, right)?
I hope it clears your doubt if not, I am sory.
回答2:
Because of multiple 1s, the error is coming. In default merge, i.e the implementation for data.frames, allow.cartesian=TRUE
by default. But in the data.table implemetation, this is not the case. Hence, if you run the below code it will give you merge output.
merge(dx, dy, by = "a", all = TRUE, allow.cartesian=TRUE)
The above code will give you the output that you want.
来源:https://stackoverflow.com/questions/36349613/joining-data-table-with-by-argument