How to convert source code to a xml based representation of the ast?

前端 未结 5 1439
囚心锁ツ
囚心锁ツ 2021-02-06 07:51

i wanna get a xml representation of the ast of java and c code. 3 months ago, i asked this question yet but the solutions weren\'t comfortable for me

  • srcml seems t
5条回答
  •  夕颜
    夕颜 (楼主)
    2021-02-06 08:02

    What didn't you understand about DMS?

    It exists.

    It has compiler accurate parsers/frontends for C, C++, Java, C#, COBOL (and many other languages).

    It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.

    It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:

     run DMSDomainParser ++XML  
    

    You can see what such an XML result looks like for Java.

    You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS XML output is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.

    DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.

    Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.

    Full disclosure: I'm the architect behind DMS.

提交回复
热议问题