how to 'fuzzy' match strings when merge two dataframe in pandas

后端 未结 2 458
小鲜肉
小鲜肉 2021-01-07 06:31

I have two dataframe df1 and df2.

df1 = pd.DataFrame ({\'Name\': [\'Adam Smith\', \'Anne Kim\', \'John Weber\', \'Ian Ford\'],
             


        
相关标签:
2条回答
  • 2021-01-07 06:52

    I am using fuzzywuzzy here

    from fuzzywuzzy import fuzz
    from fuzzywuzzy import process
    
    
    
    df2['key']=df2.Name.apply(lambda x : [process.extract(x, df1.Name, limit=1)][0][0][0])
    
    df2.merge(df1,left_on='key',right_on='Name')
    Out[1238]: 
            Name_x gender         key  Age      Name_y
    0   adam Smith      M  Adam Smith   43  Adam Smith
    1    Annie Kim      F    Anne Kim   21    Anne Kim
    2  John  Weber      M  John Weber   55  John Weber
    3     Ian Ford      M    Ian Ford   24    Ian Ford
    
    0 讨论(0)
  • 2021-01-07 07:09

    Not sure if fuzzy match is what you are looking for. Maybe make every name a proper name?

    df1.Name = df1.Name.apply(lambda x: x.title())
    df2.Name = df2.Name.apply(lambda x: x.title())
    
    pd.merge(df1, df2, how='inner', on='Name')
    
    0 讨论(0)
提交回复
热议问题