问题
I'm very new to using spaCy. I have been reading the documentation for hours and I'm still confused if it's possible to do what I have in my question. Anyway...
As the title says, is there a way to actually get a given noun chunk using a token containing it. For example, given the sentence:
"Autonomous cars shift insurance liability toward manufacturers"
Would it be possible to get the "autonomous cars"
noun chunk when what I only have the "cars"
token? Here is an example snippet of the scenario that I'm trying to go for.
startingSentence = "Autonomous cars and magic wands shift insurance liability toward manufacturers"
doc = nlp(startingSentence)
noun_chunks = doc.noun_chunks
for token in doc:
if token.dep_ == "dobj":
print(child) # this will print "liability"
# Is it possible to do anything from here to actually get the "insurance liability" token?
Any help will be greatly appreciated. Thanks!
回答1:
You can easily find the noun chunk that contains the token you've identified by checking if the token is in one of the noun chunk spans:
doc = nlp("Autonomous cars and magic wands shift insurance liability toward manufacturers")
interesting_token = doc[7] # or however you identify the token you want
for noun_chunk in doc.noun_chunks:
if interesting_token in noun_chunk:
print(noun_chunk)
The output is not correct with en_core_web_sm and spacy 2.0.18 because shift
isn't identified as a verb, so you get:
magic wands shift insurance liability
With en_core_web_md, it's correct:
insurance liability
(It makes sense to include examples with real ambiguities in the documentation because that's a realistic scenario (https://spacy.io/usage/linguistic-features#noun-chunks), but it's confusing for new users if they're ambiguous enough that the analysis is unstable across versions/models.)
来源:https://stackoverflow.com/questions/55307452/is-there-a-way-to-retrieve-the-whole-noun-chunk-using-a-root-token-in-spacy