Combined unparser/parser generator

后端 未结 5 900
孤街浪徒
孤街浪徒 2021-02-02 15:08

Is there a parser generator that also implements the inverse direction, i.e. unparsing domain objects (a.k.a. pretty-printing) from the same grammar specification? As far as I k

相关标签:
5条回答
  • 2021-02-02 15:37

    It is not possible in general.

    What makes a print pretty? A print is pretty, if spaces, tabs or newlines are at those positions, which make the print looking nicely.

    But most grammars ignore white spaces, because in most languages white spaces are not significant. There are exceptions like Python but in general the question, whether it is a good idea to use white spaces as syntax, is still controversial. And therefor most grammars do not use white spaces as syntax.

    And if the abstract syntax tree does not contain white spaces, because the parser has thrown them away, no generator can use them to pretty print an AST.

    0 讨论(0)
  • 2021-02-02 15:48

    Take a look at Invertible syntax descriptions: Unifying parsing and pretty printing.

    0 讨论(0)
  • 2021-02-02 15:49

    There are several parser generators that include an implementation of an unparser. One of them is the nearley parser generator for context-free grammars.

    It is also possible to implement bidirectional transformations of source code using definite clause grammars. In SWI-Prolog, the phrase/2 predicate can convert an input text into a parse tree and vice-versa.

    0 讨论(0)
  • 2021-02-02 15:58

    I have implemented a set of Invertible Parser Combinators in Java and Kotlin. A parser is written pretty much in LL-1 style and it provides a parse- and a print-method where the latter provides the pretty printer.

    You can find the project here: https://github.com/searles/parsing Here is a tutorial: https://github.com/searles/parsing/blob/master/tutorial.md And here is a parser/pretty printer for mathematical expressions: https://github.com/searles/parsing/blob/master/src/main/java/at/searles/demo/DemoInvert.kt

    0 讨论(0)
  • 2021-02-02 15:59

    Our DMS Software Reengineering Toolkit does precisely this (and provides a lot of additional support for analyzing/transforming code). It does this by decorating a language grammar with additional attributes, producing what is called an attribute grammar. We use a special DSL to write these rules to make them convenient to write.

    It helps to know that DMS produces a tree based directly on the grammar.

    Each DMS grammar rule is paired with with so-called "prettyprinting" rule. Each prettyprinting rule describes how to "prettyprint" the syntactic element and sub-elements recognized by its corresponding grammar rule. The prettyprinting process essentially manufactures or combines rectangular boxes of text horizontally or vertically (with optional indentation), with leaves producing unit-height boxes containing the literal value of the leaf (keyword, operator, identifier, constant, etc.

    As an example, one might write the following DMS grammar rule and matching prettyprinting rule:

    statement = 'for' '(' assignment ';' assignment ';' conditional_expression ')'
                '{' sequence_of_statements '}' ;
    <<PrettyPrinter>>: 
        { V(H('for','(',assignment[1],';','assignment[2],';',conditional_expression,')'),
            H('{', I(sequence_of_statements)),
            '}');
    

    This will parse the following:

        for ( i=x*2;
           i--;  i>-2*x ) {  a[x]+=3; 
          b[x]=a[x]-1; }
    

    (using additional grammar rules for statements and expressions) and prettyprint it (using additional prettyprinting rules for those additional grammar rules) as follows:

        for (i=x*2;i--;i>-2*x)
        {   a[x]+=3;
            b[x]=a[x]-1;
        }
    

    DMS also captures comments, attaches them to AST nodes, and regenerates them on output. The implementation is a bit exotic because most parsers don't handle comments, but utilization is easy, even "free"; comments will be automatically inserted in the prettyprinted result in their original places.

    DMS can also print in "fidelity" mode. In this form, it tries to preserve the shape of the toke (e.g., number radix, identifier character capitalization, which keyword spelling was used) the column offset (into the line) of a parsed token. This would cause the original text (or something so close that you don't think it is different) to get regenerated.

    More details about what prettyprinters must do are provided in my SO answer on Compiling an AST back to source code. DMS addresses all of those topics cleanly.

    This capability has been used by DMS on some 40+ real languages, including full IBM COBOL, PL/SQL, Java 1.8, C# 5.0, C (many dialects) and C++14.

    By writing a sufficiently interesting set of prettyprinter rules, you can build things like JavaDoc extended to include hyperlinked source code.

    0 讨论(0)
提交回复
热议问题