I have two DataFrames which I want to merge based on a column. However, due to alternate spellings, different number of spaces, absence/presence of diacritical marks, I woul
There is a package called fuzzy_pandas
that can use levenshtein
, jaro
, metaphone
and bilenco
methods. With some great examples here
import pandas as pd
import fuzzy_pandas as fpd
df1 = pd.DataFrame({'Key':['Apple', 'Banana', 'Orange', 'Strawberry']})
df2 = pd.DataFrame({'Key':['Aple', 'Mango', 'Orag', 'Straw', 'Bannanna', 'Berry']})
results = fpd.fuzzy_merge(df1, df2,
left_on='Key',
right_on='Key',
method='levenshtein',
threshold=0.6)
results.head()
Key Key
0 Apple Aple
1 Banana Bannanna
2 Orange Orag