Is there a way to retrieve the whole noun chunk using a root token in spaCy?

问题

I'm very new to using spaCy. I have been reading the documentation for hours and I'm still confused if it's possible to do what I have in my question. Anyway...

As the title says, is there a way to actually get a given noun chunk using a token containing it. For example, given the sentence:

"Autonomous cars shift insurance liability toward manufacturers"

Would it be possible to get the "autonomous cars" noun chunk when what I only have the "cars" token? Here is an example snippet of the scenario that I'm trying to go for.

startingSentence = "Autonomous cars and magic wands shift insurance liability toward manufacturers"
doc = nlp(startingSentence)
noun_chunks = doc.noun_chunks

for token in doc:
    if token.dep_ == "dobj":
        print(child) # this will print "liability"

        # Is it possible to do anything from here to actually get the "insurance liability" token?

Any help will be greatly appreciated. Thanks!

回答1:

You can easily find the noun chunk that contains the token you've identified by checking if the token is in one of the noun chunk spans:

doc = nlp("Autonomous cars and magic wands shift insurance liability toward manufacturers")
interesting_token = doc[7] # or however you identify the token you want
for noun_chunk in doc.noun_chunks:
    if interesting_token in noun_chunk:
        print(noun_chunk)

The output is not correct with en_core_web_sm and spacy 2.0.18 because shift isn't identified as a verb, so you get:

magic wands shift insurance liability

With en_core_web_md, it's correct:

insurance liability

(It makes sense to include examples with real ambiguities in the documentation because that's a realistic scenario (https://spacy.io/usage/linguistic-features#noun-chunks), but it's confusing for new users if they're ambiguous enough that the analysis is unstable across versions/models.)

来源：https://stackoverflow.com/questions/55307452/is-there-a-way-to-retrieve-the-whole-noun-chunk-using-a-root-token-in-spacy

标签

python

nlp

spacy

dependency-parsing