I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I woul
As a heads up, this basically works, except if no match is found, or if you have NaNs in either column. Instead of directly applying get_close_matches
, I found it easier to apply the following function. The choice of NaN replacements will depend a lot on your dataset.
def fuzzy_match(a, b):
left = '1' if pd.isnull(a) else a
right = b.fillna('2')
out = difflib.get_close_matches(left, right)
return out[0] if out else np.NaN