is it possible to do fuzzy match merge with python pandas?

前端 未结 11 1473
[愿得一人]
[愿得一人] 2020-11-22 01:17

I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I woul

11条回答
  •  醉话见心
    2020-11-22 02:03

    There is a package called fuzzy_pandas that can use levenshtein, jaro, metaphone and bilenco methods. With some great examples here

    import pandas as pd
    import fuzzy_pandas as fpd
    
    df1 = pd.DataFrame({'Key':['Apple', 'Banana', 'Orange', 'Strawberry']})
    df2 = pd.DataFrame({'Key':['Aple', 'Mango', 'Orag', 'Straw', 'Bannanna', 'Berry']})
    
    results = fpd.fuzzy_merge(df1, df2,
                left_on='Key',
                right_on='Key',
                method='levenshtein',
                threshold=0.6)
    
    results.head()
    
    
      Key    Key
    0 Apple  Aple
    1 Banana Bannanna
    2 Orange Orag
    

提交回复
热议问题