问题
Good Morning,
I am using NLTK to get synonyms out of a frame of words.
print(df)
col_1 col_2
Book 5
Pen 5
Pencil 6
def get_synonyms(df, column_name):
df_1 = df["col_1"]
for i in df_1:
syn = wn.synsets(i)
for synset in list(wn.all_synsets('n'))[:2]:
print(i, "-->", synset)
print("-----------")
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
return(syn)
And it does work, but I would like to get the following dataframe, with the first "n" synonyms, of each word in "col_1":
print(df_final)
col_1 synonym
Book booklet
Book album
Pen cage
...
I tried initializing an empty list, before both synset and lemma loop, and appending, but in both cases it didn't work; for example:
synonyms = []
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
synonyms.append(ci)
回答1:
You can use:
from nltk.corpus import wordnet
from itertools import chain
def get_synonyms(df, column_name, N):
L = []
for i in df[column_name]:
syn = wordnet.synsets(i)
#flatten all lists by chain, remove duplicates by set
lemmas = list(set(chain.from_iterable([w.lemma_names() for w in syn])))
for j in lemmas[:N]:
#append to final list
L.append([i, j])
#create DataFrame
return (pd.DataFrame(L, columns=['word','syn']))
#add number of filtered synonyms
df1 = get_synonyms(df, 'col_1', 3)
print (df1)
word syn
0 Book record_book
1 Book book
2 Book Word
3 Pen penitentiary
4 Pen compose
5 Pen pen
6 Pencil pencil
来源:https://stackoverflow.com/questions/49646310/create-a-dataframe-with-nltk-synonyms