Stanford NLP parse tree format

我的梦境 提交于 2019-11-29 02:41:44

This particular output format of the Stanford Parser is call the "bracketed parse (tree)". It is supposed to be read as a graph with

  • words as nodes (e.g. As, an, accountant)
  • phrase/clause as labels (e.g. S, NP, VP)
  • edges are linked hierarchically and
  • typically the parses TOP or root node is a hallucinated ROOT

(In this case you can read it as a Directed Acyclic Graph (DAG) since it's unidirectional and non-cyclic)

There are libraries out there to read bracketed parse, e.g. in NLTK's nltk.tree.Tree (http://www.nltk.org/howto/tree.html):

>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
  (S
    (PP (IN As) (NP (DT an) (NN accountant)))
    (NP (PRP I))
    (VP
      (VBP want)
      (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
                           ROOT                             
                            |                                
                            S                               
      ______________________|________                        
     |                  |            VP                     
     |                  |    ________|____                   
     |                  |   |             S                 
     |                  |   |             |                  
     |                  |   |             VP                
     |                  |   |     ________|___               
     PP                 |   |    |            VP            
  ___|___               |   |    |    ________|___           
 |       NP             NP  |    |   |            NP        
 |    ___|______        |   |    |   |         ___|_____     
 IN  DT         NN     PRP VBP   TO  VB       DT        NN  
 |   |          |       |   |    |   |        |         |    
 As  an     accountant  I  want  to make      a      payment

>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']

Note that if you're interested in specific nodes in the tree, identified by regex-like rules, you can use this very, very hand class to extract all such nodes using a regex-like matcher:

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/tregex/TregexPattern.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!