Calculating string similarity as a percentage

后端 未结 3 703
有刺的猬
有刺的猬 2021-01-21 01:02

The given function uses \"stringdist\" package in R and tells the minimum changes needed to change one string to another. I wish to find out how much similar is one string to an

相关标签:
3条回答
  • 2021-01-21 01:08

    Here is a function in base R. I added a check for vectors of equal length as inputs. You could change this logic if desired.

    strSim <- function(v1, v2) {
                if(length(v1) == length(v2)) 1 - (adist(v1, v2) / pmax(nchar(v1), nchar(v2)))
                else stop("vector lengths not equal")}
    

    this returns

    strSim("abc", "abcd")
         [,1]
    [1,] 0.75
    
    0 讨论(0)
  • 2021-01-21 01:17

    Something like this might work:

    d <- data.frame(original = c("abcd", "defg", "hij"), new = c("abce", "zxyv", "hijk"))
    d$dist <- stringdist(d$original, d$new, method = "lv")
    d$similarity <- 1 - d$dist / nchar(as.character(d$original))
    
    #### Returns:
    ####   original  new dist similarity
    #### 1     abcd abce    1  0.7500000
    #### 2     defg zxyv    4  0.0000000
    #### 3      hij hijk    1  0.6666667
    
    0 讨论(0)
  • 2021-01-21 01:32

    You can use RecordLinkage package and use the function levenshteinSim, i.e.

    #This gives the similarity
    RecordLinkage::levenshteinSim('abc', 'abcd')
    #[1] 0.75
    
    #so to get the distance just subtract from 1, 
    1 - RecordLinkage::levenshteinSim('abc', 'abcd')
    #[1] 0.25
    
    0 讨论(0)
提交回复
热议问题