Quicker way to perform fuzzy string match in pandas

馋奶兔 提交于 2019-11-29 08:49:28

Let's try difflib:

import difflib
from functools import partial

f = partial(
    difflib.get_close_matches, possibilities=names_df['names'].tolist(), n=1)

matches = extra_names['not_matching'].map(f).str[0].fillna('')
scores = [
    difflib.SequenceMatcher(None, x, y).ratio() 
    for x, y in zip(matches, extra_names['not_matching'])
]

extra_names.assign(best=matches, score=scores)

       not_matching               best     score
0         Vij Sales        Vijay Sales  0.900000
1  Crom Electronics  Croma Electronics  0.969697
2       REL Digital   Reliance Digital  0.666667
3        Bajaj Elec  Bajaj Electronics  0.740741
4     Reliance Digi   Reliance Digital  0.896552
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!