I have been trying to use the dependency parse trees generated by CMU's TurboParser. It works flawlessly. The problem, however, is that there is very little documentation. I need to precisely understand the output of their parser. For example, the sentence "I solved the problem with statistics." generates the following output:
1 I _ PRP PRP _ 2 SUB
2 solved _ VBD VBD _ 0 ROOT
3 the _ DT DT _ 4 NMOD
4 problem _ NN NN _ 2 OBJ
5 with _ IN IN _ 2 VMOD
6 statistics _ NNS NNS _ 5 PMOD
7 . _ . . _ 2 P
I haven't found any documentation that can help understand what the various columns stand for, and how the indices in the second-last column (2, 0, 4, 2, ... ) are created. Also, I have no idea why there are two columns devoted to part-of-speech tags. Any help (or link to external documentation) will be of great help.
P.S. If you want to try out their parser, here is their online demo.
P.P.S. Please do not suggest using Stanford's dependency parse output. I am interested in linear programming algorithms, which is not what Stanford's NLP system does.
I don't know TurboParser, but my guess is that the first number indicates the id of the token and that the second number indicates the id of its governor. That is, for your example:
solved(
I,
problem(the),
with(statistics),
.
)
Actually, that's CoNLL-X format. You can get more information here: http://ilk.uvt.nl/conll/#dataformat
Here is the meaning of each of the columns TurboParser outputs:
- id of the token, i.e. its one-based index in the sentence
- original token as it was in the original text
- lemma, the lemmatized form of the token (empty here, because no lemmatizer has been set)
- tag (coarse-grained part-of-speech tag)
- tag (fine-grained part-of-speech tag, which is the same as 4. with TurboParser)
- morphological features (empty here)
- head of the token, represented by its index (the root token has a head value of
0
) - relation of the current token with its head
The generated output you gave can be represented as a dependency-based parse tree:
For further information on the CoNLL-X format:
来源:https://stackoverflow.com/questions/24394196/what-does-the-dependency-parse-output-of-turboparser-mean