问题
I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:
E.g.:
(ROOT
(FRAG
(NP (NN sent28))
(: :)
(S
(NP (NNP Rome))
(VP (VBZ is)
(PP (IN in)
(NP
(NP (NNP Lazio) (NN province))
(CC and)
(NP
(NP (NNP Naples))
(PP (IN in)
(NP (NNP Campania))))))))
(. .)))
The original sentence is:
sent28: Rome is in Lazio province and Naples in Campania .
How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.
回答1:
NLTK
has a class for reading parse trees: nltk.tree.Tree
. The relevant method is called fromstring
. You can then iterate its subtrees, leaves, etc...
As an aside: you might want to remove the bit that says sent28:
as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.
回答2:
You can just use stanford parser like:
sentences = parser.raw_parse_sents(["Hello, My name is Melroy.", "What is your name?"]) #probably raw_parse(just a string) or parse_sents(list but has been splited)
for line in sentences:
for sentence in line:
***sentence.draw()***
来源:https://stackoverflow.com/questions/28674417/how-to-read-constituency-based-parse-tree