问题
i am experimenting with the stringdist package in order to make fuzzy joins and i run into a problem which i do not understand and fail to find an answer for. I want to join these 2 data tables with the "dl" method and it produces a NA, which i completely do not understand. Maybe one of you has an explanation for this. The code:
library(fuzzyjoin)
test1<-as.data.frame(test1<-c("techniker"))
test2<-as.data.frame(test2<-c("technician"))
setnames(test2,1,"label")
setnames(test1,1,"label")
x <- stringdist_join(test1, test2, by = "label", mode = "left", distance_col="distance", method="dl")
if i use the jaccard method however, there is a match:
y <- stringdist_join(test1, test2, by = "label", mode = "left", distance_col="distance", method="jaccard", q=4)
Hope anyone can clarify.
Cheers Dome
回答1:
max_dist
is set to 2 by default.
The dl
distance between "tekniker"
and "technician"
is more than 2.
so there's no match.
stringdist_join(test1, test2, by = "label", mode = "left", distance_col="distance", method="dl",max_dist=5)
# label.x label.y distance
# 1 techniker techni 3
来源:https://stackoverflow.com/questions/46346918/stringdist-join-results-in-nas