问题
# example
a <- data.frame(name=c("A","B","C"), KW=c(201902,201904,201905),price=c(1.99,3.02,5.00))
b <- data.frame(KW=c(201903,201904,201904),price=c(1.98,3.00,5.00),name=c("a","b","c"))
I want to match a and b with fuzzy logic, using the variables KW and price. I want to allow a tolerance of +/- 1 for KW and a tolerance for +/- 0.02 in price.
The desired outcome should look like this:
name.x KW.x price.x KW.y price.y name.y
1 A 201902 1.99 201903 1.98 a
2 B 201904 3.02 201904 3.00 b
3 C 201905 5.00 201904 5.00 c
I would prefer to find a solution using the fuzzyjoin
package. I tried so far using the fuzzy_inner_join
function and specifying my desired tolrences for KW and price using the match_fun
argument. However, I couldn't get it to work.
Looking for help, how to solve this problem.
回答1:
You can create a cartesian product of two dataframes using merge
and then subset
the rows which follow our required conditions.
subset(merge(a, b, by = NULL), abs(KW.x - KW.y) <= 1 &
abs(price.x - price.y) <= 0.02)
# name.x KW.x price.x KW.y price.y name.y
#1 A 201902 1.99 201903 1.98 a
#5 B 201904 3.02 201904 3.00 b
#9 C 201905 5.00 201904 5.00 c
来源:https://stackoverflow.com/questions/60883566/how-to-fuzzy-join-2-dataframes-on-2-variables-with-differing-fuzzy-logic