Pandas fuzzy detect duplicates
问题 How can use fuzzy matching in pandas to detect duplicate rows (efficiently) How to find duplicates of one column vs. all the other ones without a gigantic for loop of converting row_i toString() and then comparing it to all the other ones? 回答1: Not pandas specific, but within the python ecosystem the dedupe python library would seem to do what you want. In particular, it allows you to compare each column of a row separately and then combine the information into a single probability score of a