问题
I want to be able to fuzzy match one column and exact match another column.
Say I df1 looks like this:
And df2 looks like this:
I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far:
res <- fuzzy_left_join(
df,
df2,
by=c("Year","Name"),
list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3)
)
res %>%
select(Year = Year.x, everything(), - Year.y)
It appears to be over-matching, though. Not sure what's going on.
回答1:
It seems you are on the right track (hard to tell without your data or you showing us your result!)
The fuzzyjoin will provide all answers with string distance <=3, which may be the "overmatching" you describe.
You can use
%>% group_by(Year,Name) %>% slice_min(dist)
to get the best answer according to distance.
来源:https://stackoverflow.com/questions/58442426/how-do-i-do-one-fuzzy-and-one-exact-match-in-a-dataframe