问题
I am trying to using following code to extract entities from text available in DataFrame.
for i in df['Text'].to_list():
doc = nlp(i)
for entity in doc.ents:
if entity.label_ == 'GPE':
I need to store text of first GPE
with it's corresponding column of text. Like for instance if following is text at index 0 in column df['Text']
Match between USA and Canada was postponed
then I need only first location(USA) in another column such as df['Place']
at the corresponding index to Text which is 0. df['Place']
is not already available in DataFrame means it will be created while assigning value. I have tried following code. But it fills whole column with very first value it can find.
for i in df['Text'].to_list():
doc = nlp(i)
for entity in doc.ents:
if entity.label_ == 'GPE':
df['Place'] = (entity.text)
I have also tried to append text in list with e_list.append((entity.text))
but it will append all entities it can find in text.
Can someone help that how can I store only first entity only at corresponding index. Thank you
回答1:
You can get all the entities per each entry using Series.apply
on the Text
column like
df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
If you are only interested in getting the first entity only from each entry use
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
Here is a test snippet:
import spacy
import pandas as pd
df = pd.DataFrame({'Text':['Match between USA and Canada was postponed', 'No ents']})
df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
# => 0 [USA, Canada]
# 1 []
# Name: Text, dtype: object
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
# => 0 USA
# 1
# Name: Text, dtype: object
来源:https://stackoverflow.com/questions/65406519/how-to-select-only-first-entity-extracted-from-spacy-entities