Computing the Levenshtein ratio of each element of a data.table with each value of a reference table and merge with maximum ratio

烂漫一生 提交于 2019-12-03 09:11:08

You can try something like this:

f1 <- function(x, y) {
  require(stringdist)
  require(matrixStats)
  dis  <- stringdistmatrix(x, y, method = "lv")
  mat <- sapply(nchar(y), function(i) pmax(i, nchar(x)))
  r <- 1 - dis / mat
  w <- apply(r, 1, function(x) which(x==max(x)))
  m <- rowMaxs(r)
  list(m = m, w = w)
}

r <- f1(dt[[2]], dt.ref[[2]])
r
$m
[1] 1.0000000 0.7500000 0.3333333 0.8000000

$w
$w[[1]]
[1] 1

$w[[2]]
[1] 3 4

$w[[3]]
[1] 5

$w[[4]]
[1] 6


dt[, maxLr := r$m ]
#dtnew <- dt[rep(1:.N, sapply(r$w, length)),]
dtnew <- dt[rep(1:.N, lengths(r$w),] # thanks to Frank
dtnew[, cid := dt.ref[unlist(r$w), 1]]

Results:

dtnew
   nid  rname maxr     maxLr cid
1:  n1  apple  0.5 1.0000000  c1
2:  n2   pear  0.8 0.7500000  c3
3:  n2   pear  0.8 0.7500000  c4
4:  n3 banana  0.7 0.3333333  c5
5:  n4   kiwi  0.6 0.8000000  c6
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!