I have dictionary with Word and its closest related words.
I want to replace the related words in the string with original word. Currently I am able replace words in th
I think you can replace by new dict with regex
from this answer:
d = {'Indian': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer',
'1888': '188, 188., 18'}
d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df['col'] = df['col'].replace(d1, regex=True)
print (df)
col
0 North Indian Restaurant
1 South Indian Restaurant
2 Mexican Restaurant
3 Italian Restaurant
4 Cafe Pub
5 Irish Pub
6 Maggiee Pub
7 Jacky Craft Pub
8 Bristo 1888
9 Bristo 1888
10 Bristo 1888
EDIT (Function for the above code):
def replace_words(d, col):
d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
EDIT1:
If get errors like:
regex error- missing ), unterminated subpattern at position 7
is necessary escape regex values in keys:
import re
def replace_words(d, col):
d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')