Merge two dataframe by rows using common words [duplicate]

北战南征 提交于 2020-07-15 08:32:08

问题


df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))

I would line to merge the two dataframe using the freetext column as by option. However the text is not totally the same as some words removed or displayed.

Is there any option to find the max number of the same words between the rows and merge them according to this?

Here an example of expected output

df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))

回答1:


Perhaps, you can look into stringdist joins from fuzzyjoin and play with max_dist parameter which is suitable for your data.

fuzzyjoin::stringdist_inner_join(df1, df2, by = 'freetext', max_dist = 10)

#  freetext.x                        numid freetext.y                   aid
#  <chr>                             <dbl> <chr>                      <dbl>
#1 open until monday night             291 open until night               3
#2 one more time to insert your coin   312 one time to insert your be     5


来源:https://stackoverflow.com/questions/62739647/merge-two-dataframe-by-rows-using-common-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!