问题
I need to extract triplets of the form NP-VP-NP
from the dependency parse tree produced as the output of lexalized parsing in Stanford Parser.
Whats the best way to do this. e.g. If the parse tree is as follows:
(ROOT
(S
(S
(NP (NNP Exercise))
(VP (VBZ reduces)
(NP (NN stress)))
(. .))
(NP (JJ Regular) (NN exercise))
(VP (VBZ maintains)
(NP (JJ mental) (NN fitness)))
(. .)))
I need to extract 2 triplets:
- Exercise-reduces-stress and
- Regular Exercise-maintains-mental fitness
Any ideas?
回答1:
There are two natural options here. One is to run Semgrex over the dependency tree (side note: what you have in the question is a constituency tree), with a pattern like:
{pos:/V.*/}=verb >/.subj.*/ {}=subject >/.obj/ {}=object
Another option is to use the Stanford Open IE system. This will give you a more broad semantics of '(subject; relation; object)' triples, where the relation does not have to be a verb.
来源:https://stackoverflow.com/questions/33733669/extract-np-vp-np-from-stanford-dependency-parse-tree