is it possible to do fuzzy match merge with python pandas?

前端 未结 11 1488
[愿得一人]
[愿得一人] 2020-11-22 01:17

I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I woul

11条回答
  •  面向向阳花
    2020-11-22 01:37

    I have written a Python package which aims to solve this problem:

    pip install fuzzymatcher

    You can find the repo here and docs here.

    Basic usage:

    Given two dataframes df_left and df_right, which you want to fuzzy join, you can write the following:

    from fuzzymatcher import link_table, fuzzy_left_join
    
    # Columns to match on from df_left
    left_on = ["fname", "mname", "lname",  "dob"]
    
    # Columns to match on from df_right
    right_on = ["name", "middlename", "surname", "date"]
    
    # The link table potentially contains several matches for each record
    fuzzymatcher.link_table(df_left, df_right, left_on, right_on)
    

    Or if you just want to link on the closest match:

    fuzzymatcher.fuzzy_left_join(df_left, df_right, left_on, right_on)
    

提交回复
热议问题