问题
The given function uses "stringdist" package in R and tells the minimum changes needed to change one string to another. I wish to find out how much similar is one string to another in "%" format. Please help me and thanks.
stringdist("abc","abcd", method = "lv")
回答1:
You can use RecordLinkage
package and use the function levenshteinSim
, i.e.
#This gives the similarity
RecordLinkage::levenshteinSim('abc', 'abcd')
#[1] 0.75
#so to get the distance just subtract from 1,
1 - RecordLinkage::levenshteinSim('abc', 'abcd')
#[1] 0.25
回答2:
Something like this might work:
d <- data.frame(original = c("abcd", "defg", "hij"), new = c("abce", "zxyv", "hijk"))
d$dist <- stringdist(d$original, d$new, method = "lv")
d$similarity <- 1 - d$dist / nchar(as.character(d$original))
#### Returns:
#### original new dist similarity
#### 1 abcd abce 1 0.7500000
#### 2 defg zxyv 4 0.0000000
#### 3 hij hijk 1 0.6666667
回答3:
Here is a function in base R. I added a check for vectors of equal length as inputs. You could change this logic if desired.
strSim <- function(v1, v2) {
if(length(v1) == length(v2)) 1 - (adist(v1, v2) / pmax(nchar(v1), nchar(v2)))
else stop("vector lengths not equal")}
this returns
strSim("abc", "abcd")
[,1]
[1,] 0.75
来源:https://stackoverflow.com/questions/46446485/calculating-string-similarity-as-a-percentage