问题
I have the following 2 data tables:
DT1 <- data.table(A = c(100,50,10), B = c("Good","Ok","Bad"))
DT1
A B
1: 100 Good
2: 50 Ok
3: 10 Bad
and
DT2 <- data.table(A = c(99,34,5,"",24,86))
DT2
A
1: 99
2: 34
3: 5
4:
5: 24
6: 86
What I would like to return when joining DT1 and DT2 is
DT2
A B
1: 99 Good
2: 34 Ok
3: 5 Bad
4: NA
5: 24 Ok
6: 86 Good
The "roll" option in data.table is only for "nearest" match so it doesnt work in my case. Is there any way I can do such lookup with data.table?
回答1:
The rolling join does work for me if rolled backwards (NOCB = next obervation carried backwards):
library(data.table)
DT1 <- data.table(A = c(100, 50, 10), B = c("Good", "Ok", "Bad"))
DT2 <- data.table(A = c(99, 34, 5, "", 24, 86))
DT2[, A := as.numeric(A)]
DT1[DT2, on = "A", roll = -Inf]
A B 1: 99 Good 2: 34 Ok 3: 5 Bad 4: NA <NA> 5: 24 Ok 6: 86 Good
Note that this does only work if both columns A
are numeric (or integer). By using ""
, the OP has turned DT2$A
into a character column.
回答2:
Here is a base R approach
df1 <- as.data.frame(DT1)
df2 <- as.data.frame(DT2)
df2$B <- apply(df2, 1, function(x) {
if(x != "") df1$B[which.min(abs(as.numeric(x) - df1$A))] else NA
})
df2
# A B
# 1 99 Good
# 2 34 Ok
# 3 0 Bad
# 4 <NA>
# 5 24 Bad
# 6 86 Good
Or the same using data.table
s
DT2[, B := apply(DT2, 1, function(x)
if(x != "") DT1$B[which.min(abs(as.numeric(x) - DT1$A))] else NA)]
DT2
# A B
#1: 99 Good
#2: 34 Ok
#3: 0 Bad
#4: NA
#5: 24 Bad
#6: 86 Good
We match based on the smallest absolute difference between DT1$A
and DT2$A
values.
Sample data
DT1 <- data.table(A = c(100,50,0), B = c("Good","Ok","Bad"))
DT2 <- data.table(A = c(99,34,0,"",24,86))
来源:https://stackoverflow.com/questions/51413210/how-to-find-nearest-highest-value-when-join-data-table