Combined unparser/parser generator

后端 未结 5 899
孤街浪徒
孤街浪徒 2021-02-02 15:08

Is there a parser generator that also implements the inverse direction, i.e. unparsing domain objects (a.k.a. pretty-printing) from the same grammar specification? As far as I k

5条回答
  •  无人及你
    2021-02-02 15:59

    Our DMS Software Reengineering Toolkit does precisely this (and provides a lot of additional support for analyzing/transforming code). It does this by decorating a language grammar with additional attributes, producing what is called an attribute grammar. We use a special DSL to write these rules to make them convenient to write.

    It helps to know that DMS produces a tree based directly on the grammar.

    Each DMS grammar rule is paired with with so-called "prettyprinting" rule. Each prettyprinting rule describes how to "prettyprint" the syntactic element and sub-elements recognized by its corresponding grammar rule. The prettyprinting process essentially manufactures or combines rectangular boxes of text horizontally or vertically (with optional indentation), with leaves producing unit-height boxes containing the literal value of the leaf (keyword, operator, identifier, constant, etc.

    As an example, one might write the following DMS grammar rule and matching prettyprinting rule:

    statement = 'for' '(' assignment ';' assignment ';' conditional_expression ')'
                '{' sequence_of_statements '}' ;
    <>: 
        { V(H('for','(',assignment[1],';','assignment[2],';',conditional_expression,')'),
            H('{', I(sequence_of_statements)),
            '}');
    

    This will parse the following:

        for ( i=x*2;
           i--;  i>-2*x ) {  a[x]+=3; 
          b[x]=a[x]-1; }
    

    (using additional grammar rules for statements and expressions) and prettyprint it (using additional prettyprinting rules for those additional grammar rules) as follows:

        for (i=x*2;i--;i>-2*x)
        {   a[x]+=3;
            b[x]=a[x]-1;
        }
    

    DMS also captures comments, attaches them to AST nodes, and regenerates them on output. The implementation is a bit exotic because most parsers don't handle comments, but utilization is easy, even "free"; comments will be automatically inserted in the prettyprinted result in their original places.

    DMS can also print in "fidelity" mode. In this form, it tries to preserve the shape of the toke (e.g., number radix, identifier character capitalization, which keyword spelling was used) the column offset (into the line) of a parsed token. This would cause the original text (or something so close that you don't think it is different) to get regenerated.

    More details about what prettyprinters must do are provided in my SO answer on Compiling an AST back to source code. DMS addresses all of those topics cleanly.

    This capability has been used by DMS on some 40+ real languages, including full IBM COBOL, PL/SQL, Java 1.8, C# 5.0, C (many dialects) and C++14.

    By writing a sufficiently interesting set of prettyprinter rules, you can build things like JavaDoc extended to include hyperlinked source code.

提交回复
热议问题