Calculating string similarity as a percentage

后端未结

关注

 3  710

The given function uses \"stringdist\" package in R and tells the minimum changes needed to change one string to another. I wish to find out how much similar is one string to an

相关标签:

3条回答

隐瞒了意图╮

2021-01-21 01:08
Here is a function in base R. I added a check for vectors of equal length as inputs. You could change this logic if desired.
```
strSim <- function(v1, v2) {
            if(length(v1) == length(v2)) 1 - (adist(v1, v2) / pmax(nchar(v1), nchar(v2)))
            else stop("vector lengths not equal")}
```
this returns
```
strSim("abc", "abcd")
     [,1]
[1,] 0.75
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

无人及你

2021-01-21 01:17

Something like this might work:

d <- data.frame(original = c("abcd", "defg", "hij"), new = c("abce", "zxyv", "hijk"))
d$dist <- stringdist(d$original, d$new, method = "lv")
d$similarity <- 1 - d$dist / nchar(as.character(d$original))

#### Returns:
####   original  new dist similarity
#### 1     abcd abce    1  0.7500000
#### 2     defg zxyv    4  0.0000000
#### 3      hij hijk    1  0.6666667

0 讨论(0)

遇见更好的自我

2021-01-21 01:32

You can use RecordLinkage package and use the function levenshteinSim, i.e.

#This gives the similarity
RecordLinkage::levenshteinSim('abc', 'abcd')
#[1] 0.75

#so to get the distance just subtract from 1, 
1 - RecordLinkage::levenshteinSim('abc', 'abcd')
#[1] 0.25

0 讨论(0)