How to convert source code to a xml based representation of the ast?

陌路散爱 提交于 2019-12-03 03:33:05

a bit late but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html

I have implemented it myself and it converts PHP and Java to XML. It's free so enjoy!

Oana.

Ira Baxter

What didn't you understand about DMS?

It exists.

It has compiler accurate parsers/frontends for C, C++, Java, C#, COBOL (and many other languages).

It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.

It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:

 run DMSDomainParser ++XML  <path_to_your_file>

You can see what such an XML result looks like for Java.

You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS XML output is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.

DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.

Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.

Full disclosure: I'm the architect behind DMS.

There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I haven't actually used it myself.

Only for Java, you can use BeautyJ.

You can launch it against your file with -xml.* options. For example:

java /your/dir/BeautyJ/lib/beautyj.jar beautyj -xml.out= -xml.doctype your_file.java

...and you get an XML representation of that file (and included ones).

BTW: the "-xml.out=" options specify an output file. Used in that way, with the trailing "=", it output to STDOUT. It's not an error.

srcml supports line number and column number. Here is an example using a java file called input.java (keep in mind srcml supports multiple languages, including C/C++) that contains the following:

public class HelloWorld {
    public static void main(String[] args) {
        // Prints "Hello, World" to the terminal window.
        System.out.println("Hello, World");
    }
}

Then run srcml with the command to enable keeping track of this extra position information:

srcml input.java --position

It produces the following AST in an XML format with line number and column number embedded:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.srcML.org/srcML/src" xmlns:pos="http://www.srcML.org/srcML/position" revision="0.9.5" language="Java" filename="input.java" pos:tabs="8"><class><specifier pos:line="1" pos:column="1">public<pos:position pos:line="1" pos:column="7"/></specifier> class <name pos:line="1" pos:column="14">HelloWorld<pos:position pos:line="1" pos:column="24"/></name> <block pos:line="1" pos:column="25">{
    <function><specifier pos:line="2" pos:column="5">public<pos:position pos:line="2" pos:column="11"/></specifier> <specifier pos:line="2" pos:column="12">static<pos:position pos:line="2" pos:column="18"/></specifier> <type><name pos:line="2" pos:column="19">void<pos:position pos:line="2" pos:column="23"/></name></type> <name pos:line="2" pos:column="24">main<pos:position pos:line="2" pos:column="28"/></name><parameter_list pos:line="2" pos:column="28">(<parameter><decl><type><name><name pos:line="2" pos:column="29">String<pos:position pos:line="2" pos:column="35"/></name><index pos:line="2" pos:column="35">[]<pos:position pos:line="2" pos:column="37"/></index></name></type> <name pos:line="2" pos:column="38">args<pos:position pos:line="2" pos:column="42"/></name></decl></parameter>)<pos:position pos:line="2" pos:column="43"/></parameter_list> <block pos:line="2" pos:column="44">{
    <comment type="line" pos:line="3" pos:column="9">// Prints "Hello, World" to the terminal window.</comment>
    <expr_stmt><expr><call><name><name pos:line="4" pos:column="9">System<pos:position pos:line="4" pos:column="15"/></name><operator pos:line="4" pos:column="15">.<pos:position pos:line="4" pos:column="16"/></operator><name pos:line="4" pos:column="16">out<pos:position pos:line="4" pos:column="19"/></name><operator pos:line="4" pos:column="19">.<pos:position pos:line="4" pos:column="20"/></operator><name pos:line="4" pos:column="20">println<pos:position pos:line="4" pos:column="27"/></name></name><argument_list pos:line="4" pos:column="27">(<argument><expr><literal type="string" pos:line="4" pos:column="28">"Hello, World"<pos:position pos:line="4" pos:column="42"/></literal></expr></argument>)<pos:position pos:line="4" pos:column="43"/></argument_list></call></expr>;<pos:position pos:line="4" pos:column="44"/></expr_stmt>
    }<pos:position pos:line="5" pos:column="6"/></block></function>
}<pos:position pos:line="6" pos:column="2"/></block></class></unit>

Reference: Documentation for srcml v0.9.5 (see srcml --help). I also use srcml frequently, including this feature to obtain position information.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!