Java CFG parser that supports ambiguities

橙三吉。 提交于 2019-12-11 06:16:14

问题


I'm looking for a CFG parser implemented with Java. The thing is I'm trying to parse a natural language. And I need all possible parse trees (ambiguity) not only one of them. I already researched many NLP parsers such as Stanford parser. But they mostly require statistical data (a treebank which I don't have) and it is rather difficult and poorly documented to adapt them in to a new language. I found some parser generators such as ANTRL or JFlex but I'm not sure that they can handle ambiguities. So which parser generator or java library is best for me? Thanks in advance


回答1:


You want a parser that uses the Earley algorithm. I haven't used either of these two libraries, but PEN and PEP appear implement this algorithm in Java.




回答2:


Another option is Bison, which implements GLR. GLR is an LR type parsing algorithm that supports ambiguous grammars. Bison also generates Java code, in addition to C++.




回答3:


Take a look at the related discussion here. In my last comment in that discussion I explain that you can make any parser generator produce all of the parse trees by cloning the parse tree derived so far before making the derivation fail.

If your grammar is:

G -> ...

You would augment is as this:

G' -> G {semantic:deal-with-complete-parse-tree} <NOT-VALID-TOKEN>.

The parsing engine will ultimately fail on all derivations, but your program will either have:

  • Saved clones of all the trees.
  • Dealt with the semantics of each of the trees as they were found.

Both ANTLR and JavaCC did well when I was teaching. My preference was for ANTLR because of its BNF lexical analysis, and its much less convoluted history, vision, y and licensing.



来源:https://stackoverflow.com/questions/4584684/java-cfg-parser-that-supports-ambiguities

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!