is it possible to do fuzzy match merge with python pandas?

前端 未结 11 1476
[愿得一人]
[愿得一人] 2020-11-22 01:17

I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I woul

11条回答
  •  广开言路
    2020-11-22 02:00

    You can use d6tjoin for that

    import d6tjoin.top1
    d6tjoin.top1.MergeTop1(df1.reset_index(),df2.reset_index(),
           fuzzy_left_on=['index'],fuzzy_right_on=['index']).merge()['merged']
    

    index number index_right letter 0 one 1 one a 1 two 2 too b 2 three 3 three c 3 four 4 fours d 4 five 5 five e

    It has a variety of additional features such as:

    • check join quality, pre and post join
    • customize similarity function, eg edit distance vs hamming distance
    • specify max distance
    • multi-core compute

    For details see

    • MergeTop1 examples - Best match join examples notebook
    • PreJoin examples - Examples for diagnosing join problems

提交回复
热议问题