How to convert source code to a xml based representation of the ast?

前端 未结 5 1438
囚心锁ツ
囚心锁ツ 2021-02-06 07:51

i wanna get a xml representation of the ast of java and c code. 3 months ago, i asked this question yet but the solutions weren\'t comfortable for me

  • srcml seems t
相关标签:
5条回答
  • 2021-02-06 08:02

    What didn't you understand about DMS?

    It exists.

    It has compiler accurate parsers/frontends for C, C++, Java, C#, COBOL (and many other languages).

    It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.

    It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:

     run DMSDomainParser ++XML  <path_to_your_file>
    

    You can see what such an XML result looks like for Java.

    You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS XML output is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.

    DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.

    Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.

    Full disclosure: I'm the architect behind DMS.

    0 讨论(0)
  • 2021-02-06 08:08

    a bit late but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html

    I have implemented it myself and it converts PHP and Java to XML. It's free so enjoy!

    Oana.

    0 讨论(0)
  • 2021-02-06 08:14

    There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I haven't actually used it myself.

    0 讨论(0)
  • 2021-02-06 08:16

    srcml supports line number and column number. Here is an example using a java file called input.java (keep in mind srcml supports multiple languages, including C/C++) that contains the following:

    public class HelloWorld {
        public static void main(String[] args) {
            // Prints "Hello, World" to the terminal window.
            System.out.println("Hello, World");
        }
    }
    

    Then run srcml with the command to enable keeping track of this extra position information:

    srcml input.java --position
    

    It produces the following AST in an XML format with line number and column number embedded:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <unit xmlns="http://www.srcML.org/srcML/src" xmlns:pos="http://www.srcML.org/srcML/position" revision="0.9.5" language="Java" filename="input.java" pos:tabs="8"><class><specifier pos:line="1" pos:column="1">public<pos:position pos:line="1" pos:column="7"/></specifier> class <name pos:line="1" pos:column="14">HelloWorld<pos:position pos:line="1" pos:column="24"/></name> <block pos:line="1" pos:column="25">{
        <function><specifier pos:line="2" pos:column="5">public<pos:position pos:line="2" pos:column="11"/></specifier> <specifier pos:line="2" pos:column="12">static<pos:position pos:line="2" pos:column="18"/></specifier> <type><name pos:line="2" pos:column="19">void<pos:position pos:line="2" pos:column="23"/></name></type> <name pos:line="2" pos:column="24">main<pos:position pos:line="2" pos:column="28"/></name><parameter_list pos:line="2" pos:column="28">(<parameter><decl><type><name><name pos:line="2" pos:column="29">String<pos:position pos:line="2" pos:column="35"/></name><index pos:line="2" pos:column="35">[]<pos:position pos:line="2" pos:column="37"/></index></name></type> <name pos:line="2" pos:column="38">args<pos:position pos:line="2" pos:column="42"/></name></decl></parameter>)<pos:position pos:line="2" pos:column="43"/></parameter_list> <block pos:line="2" pos:column="44">{
        <comment type="line" pos:line="3" pos:column="9">// Prints "Hello, World" to the terminal window.</comment>
        <expr_stmt><expr><call><name><name pos:line="4" pos:column="9">System<pos:position pos:line="4" pos:column="15"/></name><operator pos:line="4" pos:column="15">.<pos:position pos:line="4" pos:column="16"/></operator><name pos:line="4" pos:column="16">out<pos:position pos:line="4" pos:column="19"/></name><operator pos:line="4" pos:column="19">.<pos:position pos:line="4" pos:column="20"/></operator><name pos:line="4" pos:column="20">println<pos:position pos:line="4" pos:column="27"/></name></name><argument_list pos:line="4" pos:column="27">(<argument><expr><literal type="string" pos:line="4" pos:column="28">"Hello, World"<pos:position pos:line="4" pos:column="42"/></literal></expr></argument>)<pos:position pos:line="4" pos:column="43"/></argument_list></call></expr>;<pos:position pos:line="4" pos:column="44"/></expr_stmt>
        }<pos:position pos:line="5" pos:column="6"/></block></function>
    }<pos:position pos:line="6" pos:column="2"/></block></class></unit>
    

    Reference: Documentation for srcml v0.9.5 (see srcml --help). I also use srcml frequently, including this feature to obtain position information.

    0 讨论(0)
  • 2021-02-06 08:25

    Only for Java, you can use BeautyJ.

    You can launch it against your file with -xml.* options. For example:

    java /your/dir/BeautyJ/lib/beautyj.jar beautyj -xml.out= -xml.doctype your_file.java
    

    ...and you get an XML representation of that file (and included ones).

    BTW: the "-xml.out=" options specify an output file. Used in that way, with the trailing "=", it output to STDOUT. It's not an error.

    0 讨论(0)
提交回复
热议问题