Theory, examples of reversible parsers?

问题

Does anyone out there know about examples and the theory behind parsers that will take (maybe) an abstract syntax tree and produce code, instead of vice-versa. Mathematically, at least intuitively, I believe the function of code->AST is reversible, but I'm trying to find work/examples of this... besides the usual resources like the Dragon book and such. Any ideas?

回答1:

Such thing is called a Visitor. Is traverses the tree and does whatever has to be done, for example optimize or generate code.

回答2:

I rather like lewap's response:

find a mathematical way to express a visitor and you have a dual to the parser

But you asked for a sample, so try this on for size: Visual Studio contains a UML editor with excellent symmetry. The way both it and the editors are implemented, all constitute views of the model, and editing either modifies the model resulting in all remaining in synch.

回答3:

Actually, generating code from a parse tree is strictly easier than parsing code, at least in a mathematical sense. There are many grammars which are ambiguous, that is, there is no unique way to parse them, but a parse tree can always be converted to a string in a unique way, modulo whitespace.

The Dragon book gives a good description of the theory of parsers.

回答4:

There are theory, working implementations and examples of reversible parsing in Haskell. The library is by Paweł Nowak. Please refer to https://hackage.haskell.org/package/syntax as your starting point. You can find the examples at following URLs.

https://hackage.haskell.org/package/syntax-example
https://hackage.haskell.org/package/syntax-example-json

回答5:

I don't know where to find much about the theory, but boost::spirit 2.0 has both qi (parser) and karma (generator), sharing the same underlying structure and grammar, so it's a practical implementation of the concept.

Documentation on the generator side is still pretty thin (spirit2 was new in Boost 1.38, and is still in beta), but there are a few bits of karma sample code around, and AFAIK the library's in a working state and there are at least some examples available.

回答6:

In addition to 'Visitor', 'unparser' is another good keyword to web-search for.

回答7:

That sounds a lot like the back end of a non-optimizing compiler that has it's target language the same as it's source language.

One question would be whether you require the "unparsed" code to be identical to the original, or just functionally equivalent.

For example, would it be OK for the output to use a different indentation style than the original? That information wouldn't normally be stored in the AST because it's not semantically important.

One thing to look at would be automatic code refactoring tools.

回答8:

I've been doing these forever, and calling them "DeParse".

It only gets tricky if you also want to recapture whitespace and comments. You have to tuck them into the parse tree so you can regenerate them on output.

回答9:

Our DMS Software Reengineering Toolkit insists on parsers and parser-inverses (called "prettyprinters") as "poker-ante" to mechanical processing (analyzing/transforming) of arbitrary languages. These provide full round-trip: source text to ASTs with captured position information (file/line/column) and comments, and AST to legal source text including regenerating the original token positions ("fidelity printing") or nicely formatted ("prettyprinting") options, including regeneration of the comments.

Parsers are often specified by a combination of grammars and lexical definitions of tokens; these notations are typically compiled into efficient parsing engines, and DMS does that for the "parser" side, as you might expect. Other folks here suggest that a "visitor" is the way to do prettyprinting, and, like assembly code, it is the right way to implement prettyprinting at the lowest level of abstraction. However, DMS prettyprinters are specified in terms of a text-box construction langauge over grammar terms something like Latex, that enables one to control the placement of the various language elements horizontally, vertically, embedded, spaced, concatenated, laminated, etc. DMS compiles these into efficient low-level visitors (as other answers suggest) that implement the box generation. But like the parser generator, you don't have see all the ugly detail.

DMS has some 30+ sets of these language front ends for a various programming langauge and formal notations, ranging from C++, C, Java, C#, COBOL, etc. to HTML, XML, assembly languages from some machines, temporaral property specifications, specs for composable abstract algebras, etc.

回答10:

The "Visitor Pattern" idea is good. But, I should consider "Visitor" pattern as a lineal list pattern, or, as a generic pattern, and add patterns for more specific cases like Lists, Matrices, and Trees.

Look for a "Hierarchical Visitor Pattern" or "Tree Visitor Pattern" on the web.

You have a tree data structure ("Collection") and want to do something with the data, each time you "visit", "iterate" or "read" an item from the tree.

In your case, you have a tree data structure, that represents the result of scanning/parsing some source code. Then you have read each item's data, and transform it into destination code.

回答11:

There are several "lens languages" that allow bidirection transformation of source code.

It is also possible to implement reversible parsers using definite clause grammars in Prolog. In SWI-Prolog, the phrase/3 predicate converts parse trees into text and vice-versa. This book provides some additional examples of reversible parsing in Prolog.

来源：https://stackoverflow.com/questions/662041/theory-examples-of-reversible-parsers

标签

language-agnostic

parsing

code-generation

theory