Joining two datasets using fuzzy logic

怎甘沉沦 提交于 2019-11-26 16:56:51

问题


I’m trying to do a fuzzy logic join in R between two datasets:

  • first data set has the name of a location and a column called config
  • second data set has the name of a location and two additional attributes that need to be summarized before they are joined to the first data set.

I would like to use the name column to join between the two data sets. However the name column may have additional or leading characters in either data set or have one word contained inside of a larger word. So for example if we looked at these two data sets, I'd like the name OPAL to join to the OPALAS, and SAUSALITO Y to join to SAUSALITO.

Dataset1:    
     Name           Config
     ALTO D         BB
     CONTRA         ST
     EIGHT A        DD
     OPALAS         BB
     SAUSALITO Y    AA
     SOLANO J       ST

Dataset2:    
    Name       Age     Rank
    ALTO D     50      2
    ALTO D     20      6
    CONTRA     10      10
    CONTRA     15      15
    EIGHTH     18      21
    OPAL       19      4
    SAUSALITO  2       12
    SOLANO     34      43

Code for Summarizing of Dataset2

Data2a <- summaryBy(Age ~ Name,FUN=c(mean), data=Data2,na.rm=TRUE)
Data2b <- summaryBy(Rank ~ Name,FUN=c(sum), data=Data2,na.rm=TRUE)
Data2 <- data.frame(Data2a$Name, Data2a$Age.mean, Data2b$Rank.sum)

Desired Outcome:
    Name        Config  Age   Rank
    ALTO D      BB      35    8
    CONTRA      ST      12.5  25
    EIGHT A     DD      18    21
    OPALAS      BB      19    4
    SAUSALITO Y AA      12    5
    SOLANO J    ST      34    43

回答1:


I was able to join the two datasets, using the fuzzyjoin package:

library(fuzzyjoin)
stringdist_inner_join(Dataset1, Data2,
     by ="Name", distance_col = NULL)


来源:https://stackoverflow.com/questions/44359860/joining-two-datasets-using-fuzzy-logic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!