Apply fuzzy matching across a dataframe column and save results in a new column

 ̄綄美尐妖づ 提交于 2019-11-27 01:30:20

I couldn't tell what you were doing. This is how I would do it.

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

Create a series of tuples to compare:

compare = pd.MultiIndex.from_product([df1['Company'],
                                      df2['FDA Company']]).to_series()

Create a special function to calculate fuzzy metrics and return a series.

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup)],
                     ['ratio', 'token'])

Apply metrics to the compare series

compare.apply(metrics)

There are bunch of ways to do this next part:

Get closest matches to each row of df1

compare.apply(metrics).unstack().idxmax().unstack(0)

Get closest matches to each row of df2

compare.apply(metrics).unstack(0).idxmax().unstack(0)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!