问题
I found that NLKT in python does it via *raw_parse* function but I need to use Java. I found cleartk has a MaltParser wrapper but there is no documentation about it. I'm looking for a function or a project that first converts raw English text to conll file that MaltParser can use and parses it with MaltParser. Any help is appreciated.
回答1:
There are examples coming with the MaltParser 1.7.2 distribution in the folder examples/apiexamples/srcex.
However, these examples only show how to run the MaltParser programmatically after tokenization and pos-tagging have already been performed (and after the output of these steps has been converted to a CONLL-like format).
Since I currently cannot offer a better (simpler/shorter) alternative, at least I could share with you a link to a Groovy script which performs tokenization, part-of-speech tagging (using OpenNLP) and dependency parsing (using MaltParser). The tools are made interoperable using UIMA. If one is familiar with Maven, it should be quite straight forward to derive a Java version of that script.
Mind, this is not the best answer, but at this point possibly better than nothing.
Note: I'm a developer on both, Apache UIMA and DKPro Core (the project to which the link points).
来源:https://stackoverflow.com/questions/17392790/parse-raw-text-with-maltparser-in-java