Which tool to use to parse programming languages in Python?

前端 未结 9 1948
情书的邮戳
情书的邮戳 2021-01-30 07:09

Which Python tool can you recommend to parse programming languages? It should allow for a readable representation of the language grammar inside the source, and it should be abl

相关标签:
9条回答
  • 2021-01-30 07:35

    Ned Batchelder did a survey of python parsing tools, which apparently he keeps updated (last updated July 2010):

    http://nedbatchelder.com/text/python-parsers.html

    If I was going to need a parser today, I would either roll my own recursive descent parser, or possibly use PLY or LEPL -- depending on my needs and whether or not I was willing to introduce an external dependency. I wouldn't personally use PyParsing for anything very complicated.

    0 讨论(0)
  • 2021-01-30 07:38

    Antlr is what you should look at http://www.antlr.org

    Take a look at this http://www.antlr.org/wiki/display/ANTLR3/Antlr3PythonTarget

    0 讨论(0)
  • 2021-01-30 07:42

    If you're evaluating PyParsing, I think you should look at funcparserlib: http://pypi.python.org/pypi/funcparserlib

    It's a bit similar, but in my experience resulting code is much cleaner.

    0 讨论(0)
  • 2021-01-30 07:45

    I really like pyPEG. Its error reporting isn't very friendly, but it can add source code locations to the AST.

    pyPEG doesn't have a separate lexer, which would make parsing Python itself hard (I think CPython recognises indent and dedent in the lexer), but I've used pyPEG to build a parser for subset of C# with surprisingly little work.

    An example adapted from fdik.org/pyPEG/: A simple language like this:

    function fak(n) {
        if (n==0) { // 0! is 1 by definition
            return 1;
        } else {
            return n * fak(n - 1);
        };
    }
    

    A pyPEG parser for that language:

    def comment():          return [re.compile(r"//.*"),
                                    re.compile("/\*.*?\*/", re.S)]
    def literal():          return re.compile(r'\d*\.\d*|\d+|".*?"')
    def symbol():           return re.compile(r"\w+")
    def operator():         return re.compile(r"\+|\-|\*|\/|\=\=")
    def operation():        return symbol, operator, [literal, functioncall]
    def expression():       return [literal, operation, functioncall]
    def expressionlist():   return expression, -1, (",", expression)
    def returnstatement():  return keyword("return"), expression
    def ifstatement():      return (keyword("if"), "(", expression, ")", block,
                                    keyword("else"), block)
    def statement():        return [ifstatement, returnstatement], ";"
    def block():            return "{", -2, statement, "}"
    def parameterlist():    return "(", symbol, -1, (",", symbol), ")"
    def functioncall():     return symbol, "(", expressionlist, ")"
    def function():         return keyword("function"), symbol, parameterlist, block
    def simpleLanguage():   return function
    
    0 讨论(0)
  • 2021-01-30 07:45

    pyPEG (a tool I authored) has a tracing facility for error reporting.

    Just set pyPEG.print_trace = True and pyPEG will give you a full trace of what's happening inside.

    0 讨论(0)
  • 2021-01-30 07:47

    I would recommend that you check out my library: https://github.com/erezsh/lark

    It can parse ALL context-free grammars, automatically builds an AST (with line & column numbers), and accepts the grammar in EBNF format, which is considered the standard.

    It can easily parse a language like Python, and it can do so faster than any other parsing library written in Python.

    0 讨论(0)
提交回复
热议问题