问题
df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))
I would line to merge the two dataframe using the freetext column as by option. However the text is not totally the same as some words removed or displayed.
Is there any option to find the max number of the same words between the rows and merge them according to this?
Here an example of expected output
df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))
回答1:
Perhaps, you can look into stringdist
joins from fuzzyjoin
and play with max_dist
parameter which is suitable for your data.
fuzzyjoin::stringdist_inner_join(df1, df2, by = 'freetext', max_dist = 10)
# freetext.x numid freetext.y aid
# <chr> <dbl> <chr> <dbl>
#1 open until monday night 291 open until night 3
#2 one more time to insert your coin 312 one time to insert your be 5
来源:https://stackoverflow.com/questions/62739647/merge-two-dataframe-by-rows-using-common-words