is it possible to do fuzzy match merge with python pandas?

前端 未结 11 1489
[愿得一人]
[愿得一人] 2020-11-22 01:17

I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I woul

11条回答
  •  闹比i
    闹比i (楼主)
    2020-11-22 01:47

    Similar to @locojay suggestion, you can apply difflib's get_close_matches to df2's index and then apply a join:

    In [23]: import difflib 
    
    In [24]: difflib.get_close_matches
    Out[24]: 
    
    In [25]: df2.index = df2.index.map(lambda x: difflib.get_close_matches(x, df1.index)[0])
    
    In [26]: df2
    Out[26]: 
          letter
    one        a
    two        b
    three      c
    four       d
    five       e
    
    In [31]: df1.join(df2)
    Out[31]: 
           number letter
    one         1      a
    two         2      b
    three       3      c
    four        4      d
    five        5      e
    

    .

    If these were columns, in the same vein you could apply to the column then merge:

    df1 = DataFrame([[1,'one'],[2,'two'],[3,'three'],[4,'four'],[5,'five']], columns=['number', 'name'])
    df2 = DataFrame([['a','one'],['b','too'],['c','three'],['d','fours'],['e','five']], columns=['letter', 'name'])
    
    df2['name'] = df2['name'].apply(lambda x: difflib.get_close_matches(x, df1['name'])[0])
    df1.merge(df2)
    

提交回复
热议问题