fuzzy match between 2 columns (Python)

六月ゝ 毕业季﹏ 提交于 2019-12-23 03:09:52

问题


I have a pandas dataframe called "df_combo" which contains columns "worker_id", "url_entrance", "company_name". I am trying to produce an output column that would tell me if the URLs in "url_entrance" column contains any word in "company_name" column. Even a close match like fuzzywuzzy would work.

For example, if the URL is "www.grandhotelseattle.com" and the "company_name" is "Hotel Prestige Seattle", then the fuzz ratio might be somewhere 70-80.

I have tried the following script: >>>fuzz.ratio(df_combo['url_entrance'],df_combo['company_name']) but it returns only 1 number which is the overall fuzz ratio for the whole column. I would like to have fuzz ratio for every row and store those ratios in a new column.


回答1:


Thanks everyone for your inputs. I have solved my problem! The link that "agg3l" provided was helpful. The "TypeError" I saw was because either the "url_entrance" or "company_name" has some floating types in certain rows. I converted both columns to string using the following scripts, re-ran the fuzz.ratio script and got it to work!

df_combo['url_entrance']=df_combo['url_entrance'].astype(str) df_combo['company_name']=df_combo['company_name'].astype(str)



来源:https://stackoverflow.com/questions/40143675/fuzzy-match-between-2-columns-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!