Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP

后端 未结 2 1502
予麋鹿
予麋鹿 2021-02-14 03:44

generally A head of a nounphrase is a noun which is rightmost of the NP as shown below tree is the head of the parent NP. So

            ROOT                                  


        
2条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-14 04:15

    There are built-in string to Tree object in NLTK (http://www.nltk.org/_modules/nltk/tree.html), see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541.

    >>> from nltk.tree import Tree
    >>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
    >>> for i in Tree.fromstring(parsestr).subtrees():
    ...     if i.label() == 'NP':
    ...             print i
    ... 
    (NP
      (NP (DT The) (JJ old) (NN oak) (NN tree))
      (PP (IN from) (NP (NNP India))))
    (NP (DT The) (JJ old) (NN oak) (NN tree))
    (NP (NNP India))
    
    
    >>> for i in Tree.fromstring(parsestr).subtrees():
    ...     if i.label() == 'NP':
    ...             print i.leaves()
    ... 
    ['The', 'old', 'oak', 'tree', 'from', 'India']
    ['The', 'old', 'oak', 'tree']
    ['India']
    

    Note that it's not always the case that right most noun is the head noun of an NP, e.g.

    >>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
    >>> Tree.fromstring(s)
    Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
    >>> for i in Tree.fromstring(s).subtrees():
    ...     if i.label() == 'NP':
    ...             print i.leaves()[-1]
    ... 
    Magnificent
    talk
    

    Arguably, Magnificent can still be the head noun. Another example is when the NP includes a relative clause:

    (NP (NP the person) that gave (NP the talk)) went home

    The head noun of the subject is person but the last leave node of the NP the person that gave the talk is talk.

提交回复
热议问题