i wanna get a xml representation of the ast of java and c code. 3 months ago, i asked this question yet but the solutions weren\'t comfortable for me
What didn't you understand about DMS?
It exists.
It has compiler accurate parsers/frontends for C, C++, Java, C#, COBOL (and many other languages).
It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.
It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:
run DMSDomainParser ++XML <path_to_your_file>
You can see what such an XML result looks like for Java.
You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS XML output is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.
DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.
Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.
Full disclosure: I'm the architect behind DMS.
a bit late but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html
I have implemented it myself and it converts PHP and Java to XML. It's free so enjoy!
Oana.
There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I haven't actually used it myself.
srcml supports line number and column number. Here is an example using a java file called input.java (keep in mind srcml supports multiple languages, including C/C++) that contains the following:
public class HelloWorld {
public static void main(String[] args) {
// Prints "Hello, World" to the terminal window.
System.out.println("Hello, World");
}
}
Then run srcml with the command to enable keeping track of this extra position information:
srcml input.java --position
It produces the following AST in an XML format with line number and column number embedded:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.srcML.org/srcML/src" xmlns:pos="http://www.srcML.org/srcML/position" revision="0.9.5" language="Java" filename="input.java" pos:tabs="8"><class><specifier pos:line="1" pos:column="1">public<pos:position pos:line="1" pos:column="7"/></specifier> class <name pos:line="1" pos:column="14">HelloWorld<pos:position pos:line="1" pos:column="24"/></name> <block pos:line="1" pos:column="25">{
<function><specifier pos:line="2" pos:column="5">public<pos:position pos:line="2" pos:column="11"/></specifier> <specifier pos:line="2" pos:column="12">static<pos:position pos:line="2" pos:column="18"/></specifier> <type><name pos:line="2" pos:column="19">void<pos:position pos:line="2" pos:column="23"/></name></type> <name pos:line="2" pos:column="24">main<pos:position pos:line="2" pos:column="28"/></name><parameter_list pos:line="2" pos:column="28">(<parameter><decl><type><name><name pos:line="2" pos:column="29">String<pos:position pos:line="2" pos:column="35"/></name><index pos:line="2" pos:column="35">[]<pos:position pos:line="2" pos:column="37"/></index></name></type> <name pos:line="2" pos:column="38">args<pos:position pos:line="2" pos:column="42"/></name></decl></parameter>)<pos:position pos:line="2" pos:column="43"/></parameter_list> <block pos:line="2" pos:column="44">{
<comment type="line" pos:line="3" pos:column="9">// Prints "Hello, World" to the terminal window.</comment>
<expr_stmt><expr><call><name><name pos:line="4" pos:column="9">System<pos:position pos:line="4" pos:column="15"/></name><operator pos:line="4" pos:column="15">.<pos:position pos:line="4" pos:column="16"/></operator><name pos:line="4" pos:column="16">out<pos:position pos:line="4" pos:column="19"/></name><operator pos:line="4" pos:column="19">.<pos:position pos:line="4" pos:column="20"/></operator><name pos:line="4" pos:column="20">println<pos:position pos:line="4" pos:column="27"/></name></name><argument_list pos:line="4" pos:column="27">(<argument><expr><literal type="string" pos:line="4" pos:column="28">"Hello, World"<pos:position pos:line="4" pos:column="42"/></literal></expr></argument>)<pos:position pos:line="4" pos:column="43"/></argument_list></call></expr>;<pos:position pos:line="4" pos:column="44"/></expr_stmt>
}<pos:position pos:line="5" pos:column="6"/></block></function>
}<pos:position pos:line="6" pos:column="2"/></block></class></unit>
Reference: Documentation for srcml v0.9.5 (see srcml --help). I also use srcml frequently, including this feature to obtain position information.
Only for Java, you can use BeautyJ.
You can launch it against your file with -xml.* options. For example:
java /your/dir/BeautyJ/lib/beautyj.jar beautyj -xml.out= -xml.doctype your_file.java
...and you get an XML representation of that file (and included ones).
BTW: the "-xml.out=" options specify an output file. Used in that way, with the trailing "=", it output to STDOUT. It's not an error.