问题
I'm looking for a CFG parser implemented with Java. The thing is I'm trying to parse a natural language. And I need all possible parse trees (ambiguity) not only one of them. I already researched many NLP parsers such as Stanford parser. But they mostly require statistical data (a treebank which I don't have) and it is rather difficult and poorly documented to adapt them in to a new language. I found some parser generators such as ANTRL or JFlex but I'm not sure that they can handle ambiguities. So which parser generator or java library is best for me? Thanks in advance
回答1:
You want a parser that uses the Earley algorithm. I haven't used either of these two libraries, but PEN and PEP appear implement this algorithm in Java.
回答2:
Another option is Bison, which implements GLR. GLR is an LR type parsing algorithm that supports ambiguous grammars. Bison also generates Java code, in addition to C++.
回答3:
Take a look at the related discussion here. In my last comment in that discussion I explain that you can make any parser generator produce all of the parse trees by cloning the parse tree derived so far before making the derivation fail.
If your grammar is:
G -> ...
You would augment is as this:
G' -> G {semantic:deal-with-complete-parse-tree} <NOT-VALID-TOKEN>.
The parsing engine will ultimately fail on all derivations, but your program will either have:
- Saved clones of all the trees.
- Dealt with the semantics of each of the trees as they were found.
Both ANTLR and JavaCC did well when I was teaching. My preference was for ANTLR because of its BNF lexical analysis, and its much less convoluted history, vision, y and licensing.
来源:https://stackoverflow.com/questions/4584684/java-cfg-parser-that-supports-ambiguities