Passing arguments into multiple match_fun functions in R fuzzyjoin::fuzzy_join

ぐ巨炮叔叔 提交于 2019-12-04 18:14:46

I think the error is because the arguments passed into each of the multiple match_fun's mess it up i.e. can't pass extra arguments like ignore_case, originally intended for just the string_dist match_fun, into a match_fun of >=

The solution would be to define my own match_fun's with fixed parameters for arguments. See below where I define my own match_fun_stringdist with fixed parameters. I also implemented it here in another question/answer https://stackoverflow.com/a/44383103/4663008.

# First, need to define match_fun_stringdist 
# Code from stringdist_join from https://github.com/dgrtwo/fuzzyjoin
match_fun_stringdist <- function(v1, v2) {

  # Can't pass these parameters in from fuzzy_join because of multiple incompatible match_funs, so I set them here.
  ignore_case = FALSE
  method = "dl"
  max_dist = 99
  distance_col = "dist"

  if (ignore_case) {
    v1 <- stringr::str_to_lower(v1)
    v2 <- stringr::str_to_lower(v2)
  }

  # shortcut for Levenshtein-like methods: if the difference in
  # string length is greater than the maximum string distance, the
  # edit distance must be at least that large

  # length is much faster to compute than string distance
  if (method %in% c("osa", "lv", "dl")) {
    length_diff <- abs(stringr::str_length(v1) - stringr::str_length(v2))
    include <- length_diff <= max_dist

    dists <- rep(NA, length(v1))

    dists[include] <- stringdist::stringdist(v1[include], v2[include], method = method)
  } else {
    # have to compute them all
    dists <- stringdist::stringdist(v1, v2, method = method)
  }
  ret <- dplyr::data_frame(include = (dists <= max_dist))
  if (!is.null(distance_col)) {
    ret[[distance_col]] <- dists
  }
  ret
}

and call fuzzy_join

fuzzy_join(data1, data2, 
           by = list(x = c("Address1", "AREACODE", "Year1"), y = c("Address2", "AREA_CODE", "Year2")), 
           match_fun = list(match_fun_stringdist, `==`, `<=`),
           mode = "left")
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!