fuzzy matching in R

早过忘川 提交于 2019-11-29 11:17:42

I found this question referenced while answering a question today. So I thought of answering the original question.

library(dplyr)
library(fuzzyjoin)

df1 %>%
  stringdist_left_join(df2, by=c(entry="fruit"), ignore_case=T, method="jw", distance_col="dist") %>%
  group_by(entry) %>%
  top_n(-1) %>%
  select(-dist)

Output is:

     id entry                   fruit      code
  <dbl> <fct>                   <fct>     <dbl>
1  1.00 Apple                   apple      11.0
2  2.00 I love apples           pineapple  13.0
3  3.00 appls                   apple      11.0
4  4.00 Bannanas                banana     12.0
5  5.00 banana                  banana     12.0
6  6.00 An apple a day keeps... apple      11.0

Sample data:

df1 <- data.frame(id = c(1, 2, 3, 4, 5, 6),
                  entry = c("Apple", "I love apples", "appls", "Bannanas", "banana", "An apple a day keeps..."))
df2 <- data.frame(fruit=c("apple", "banana", "pineapple"), code=c(11, 12, 13))
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!