Recommend C front-end that preserves preprocessor directives

后端 未结 4 857
说谎
说谎 2021-01-14 00:19

I\'d like to start a project that involves transforming C code, but I\'d like to include the preprocessor directives. I don\'t want to reinvent th

4条回答
  •  北恋
    北恋 (楼主)
    2021-01-14 01:07

    Our DMS Software Reengineering Toolkit has a C front end (and a C++ front end) that:

    • parses (compilable) C source code in a variety of dialects into ASTs,
    • preserves the preprocessor directives in most cases as AST nodes
    • can regenerate compilable C code (with comments and preprocessor directives) from the ASTs
    • can collects thousands of files in a single image to allow cross-file analysis and transformation
    • provides full symbol table construction and access
    • provides procedural access to ASTs with a large AST manipulation library, including navigate, inspect, insert, delete, replace, match, ...
    • provides source-to-source transformations using patterns written in the C notation that match against the ASTs

    For C (not yet for C++), DMS also provides:

    • control and data flow analysis
    • local and global points-to analysis
    • global call graph construction

    DMS has been used to process extremely large C applications for the purposes of extracting facts and generating new, derived code from the original source base.

    (EDIT: Feb 2016)

    It can handle the OP's example (with slight fixes to make it valid). Here's the slightly revised source:

    #define FILENAME "filename"
    #include 
    
    FILE *f;
    main() {
      f=0;
    if (file_is_open) {
    #ifdef CAN_OPEN_IT
    f = fopen(FILENAME, "r");
    #else
    printf("Unable to open file.\n");
    #endif
    }
    
    }
    

    Here is the AST produced:

    C~GCC4 Domain Parser Version 3.0.1(28449)
    Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
    Powered by DMS (R) Software Reengineering Toolkit
    AST Optimizations: remove constant tokens, remove unary productions, compact sequences
    Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I
    (translation_unit@C~GCC4=2#4a7e0e0^0 Line 1 Column 1 File C:/temp/test.c
     (declaration_seq@C~GCC4=605#4a77580^1#4a7e0e0:1 {4} Line 1 Column 1 File C:/temp/test.c
      (control_line@C~GCC4=1094#4a775c0^1#4a77580:1 Line 1 Column 1 File C:/temp/test.c
       ('#'@C~GCC4=1548#4a771c0^1#4a775c0:1[Keyword:0] Line 1 Column 1 File C:/temp/test.c)'#'
       (IDENTIFIER@C~GCC4=1531#4a77200^1#4a775c0:2[`FILENAME'] Line 1 Column 9 File C:/temp/test.c)IDENTIFIER
       (@C~GCC4=1603#4a77180^2#4a775c0:3#4a7f300:1[`FILENAME'] Line 1 Column 18 File C:/temp/test.c
    $VOID$ [Child 1]
       |(STRING_LITERAL@C~GCC4=1525#4a77160^2#4a77180:2#4a7f300:2[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
    $VOID$ [Child 3]
       )#4a77180
       (new_line@C~GCC4=1578#4a77260^1#4a775c0:4[Keyword:0] Line 1 Column 28 File C:/temp/test.c)new_line
      )control_line#4a775c0
      (control_line@C~GCC4=1104#4a77460^1#4a77580:2 Line 2 Column 1 File C:/temp/test.c
       ('#'@C~GCC4=1548#4a77340^1#4a77460:1[Keyword:0] Line 2 Column 1 File C:/temp/test.c)'#'
       (ANGLED_HEADER_NAME@C~GCC4=1589#4a77380^1#4a77460:2[`stdio.h'] Line 2 Column 10 File C:/temp/test.c)ANGLED_HEADER_NAME
       (new_line@C~GCC4=1578#4a773c0^1#4a77460:3[Keyword:0] Line 2 Column 19 File C:/temp/test.c)new_line
      )control_line#4a77460
      (simple_declaration@C~GCC4=631#4a774c0^1#4a77580:3 Line 4 Column 1 File C:/temp/test.c
       (IDENTIFIER@C~GCC4=1531#4a77360^1#4a774c0:1[`FILE'] Line 4 Column 1 File C:/temp/test.c)IDENTIFIER
       (declarator@C~GCC4=850#4a77520^1#4a774c0:2 Line 4 Column 6 File C:/temp/test.c
       |(ptr_operator@C~GCC4=866#4a77560^1#4a77520:1 Line 4 Column 6 File C:/temp/test.c)ptr_operator
       |(IDENTIFIER@C~GCC4=1531#4a77480^1#4a77520:2[`f'] Line 4 Column 7 File C:/temp/test.c)IDENTIFIER
       )declarator#4a77520
      )simple_declaration#4a774c0
      (function_definition@C~GCC4=966#4a77be0^1#4a77580:4 Line 5 Column 1 File C:/temp/test.c
       (direct_declarator@C~GCC4=852#4a77440^1#4a77be0:1 Line 5 Column 1 File C:/temp/test.c
       |(IDENTIFIER@C~GCC4=1531#4a774e0^1#4a77440:1[`main'] Line 5 Column 1 File C:/temp/test.c)IDENTIFIER
       |(parameter_declaration_clause@C~GCC4=900#4a77220^1#4a77440:2 Line 5 Column 6 File C:/temp/test.c)parameter_declaration_clause
       )direct_declarator#4a77440
       (compound_statement@C~GCC4=507#4a77b20^1#4a77be0:2 Line 5 Column 8 File C:/temp/test.c
       |(statement_seq@C~GCC4=511#4a77d20^1#4a77b20:1 {2} Line 6 Column 3 File C:/temp/test.c
       | (AMBIGUITY@C~GCC4=1602#4a77680^1#4a77d20:1{2} Line 6 Column 3 File C:/temp/test.c
       |  (expression_statement@C~GCC4=503#4a7e040^1#4a77680:1 Line 6 Column 3 File C:/temp/test.c
       |   (assignment_expression@C~GCC4=457#4a77f00^1#4a7e040:1 Line 6 Column 3 File C:/temp/test.c
       |   |(assignment_target@C~GCC4=470#4a77a00^1#4a77f00:1 Line 6 Column 3 File C:/temp/test.c
       |   | (IDENTIFIER@C~GCC4=1531#4a77400^2#4a77a00:1#4a77fc0:1[`f'] Line 6 Column 3 File C:/temp/test.c)IDENTIFIER
       |   |)assignment_target#4a77a00
       |   |(INT_LITERAL@C~GCC4=1471#4a77a60^2#4a77f00:2#4a77f60:1[0] Line 6 Column 5 File C:/temp/test.c)INT_LITERAL
       |   )assignment_expression#4a77f00
       |  )expression_statement#4a7e040
       |  (simple_declaration@C~GCC4=630#4a7e060^1#4a77680:2 Line 6 Column 3 File C:/temp/test.c
       |   (init_declarator@C~GCC4=835#4a77fc0^1#4a7e060:1 Line 6 Column 3 File C:/temp/test.c
       |   |(IDENTIFIER@C~GCC4=1531#4a77400^2... [ALREADY PRINTED] ...)
       |   |(initializer@C~GCC4=983#4a77f60^1#4a77fc0:2 Line 6 Column 4 File C:/temp/test.c
       |   | (INT_LITERAL@C~GCC4=1471#4a77a60^2... [ALREADY PRINTED] ...)
       |   |)initializer#4a77f60
       |   )init_declarator#4a77fc0
       |  )simple_declaration#4a7e060
       | )AMBIGUITY#4a77680
       | (selection_statement@C~GCC4=527#4a77b40^1#4a77d20:2 Line 7 Column 1 File C:/temp/test.c
       |  (IDENTIFIER@C~GCC4=1531#4a7e0c0^1#4a77b40:1[`file_is_open'] Line 7 Column 5 File C:/temp/test.c)IDENTIFIER
       |  (compound_statement@C~GCC4=507#4a77ae0^1#4a77b40:2 Line 7 Column 19 File C:/temp/test.c
       |   (statement@C~GCC4=490#4a7f840^1#4a77ae0:1 Line 8 Column 1 File C:/temp/test.c
       |   |(if_directive@C~GCC4=1088#4a7f1c0^1#4a7f840:1 Line 8 Column 1 File C:/temp/test.c
       |   | ('#'@C~GCC4=1548#4a7f240^1#4a7f1c0:1[Keyword:0] Line 8 Column 1 File C:/temp/test.c)'#'
       |   | (IDENTIFIER@C~GCC4=1531#4a7ee60^1#4a7f1c0:2[`CAN_OPEN_IT'] Line 8 Column 8 File C:/temp/test.c)IDENTIFIER
       |   | (new_line@C~GCC4=1578#4a7f1e0^1#4a7f1c0:3[Keyword:0] Line 8 Column 19 File C:/temp/test.c)new_line
       |   |)if_directive#4a7f1c0
       |   |(AMBIGUITY@C~GCC4=1602#4a77d40^1#4a7f840:2{2} Line 9 Column 5 File C:/temp/test.c
       |   | (expression_statement@C~GCC4=503#4a7f4a0^1#4a77d40:1 Line 9 Column 5 File C:/temp/test.c
       |   |  (assignment_expression@C~GCC4=457#4a7f3c0^1#4a7f4a0:1 Line 9 Column 5 File C:/temp/test.c
       |   |   (assignment_target@C~GCC4=470#4a7eec0^1#4a7f3c0:1 Line 9 Column 5 File C:/temp/test.c
       |   |   |(IDENTIFIER@C~GCC4=1531#4a7eee0^2#4a7eec0:1#4a7f400:1[`f'] Line 9 Column 5 File C:/temp/test.c)IDENTIFIER
       |   |   )assignment_target#4a7eec0
       |   |   (postfix_expression@C~GCC4=201#4a7f2e0^1#4a7f3c0:2 Line 9 Column 9 File C:/temp/test.c
       |   |   |(IDENTIFIER@C~GCC4=1531#4a7f120^2#4a7f2e0:1#4a7f160:1[`fopen'] Line 9 Column 9 File C:/temp/test.c)IDENTIFIER
       |   |   |(expression_list@C~GCC4=228#4a7f260^2#4a7f2e0:2#4a7f160:2 Line 9 Column 15 File C:/temp/test.c
       |   |   | (@C~GCC4=1607#4a7f300^1#4a7f260:1[`FILENAME'] Line 9 Column 15 File C:/temp/test.c
       |   |   |  (@C~GCC4=1603#4a77180^2... [ALREADY PRINTED] ...)
       |   |   |  (STRING_LITERAL@C~GCC4=1525#4a77160^2... [ALREADY PRINTED] ...)
       |   |   |  $VOID$ [Child 3]
       |   |   |  (STRING_LITERAL@C~GCC4=1525#4a7f2c0^1#4a7f300:4[`filename'] Line 1 Column 18 File C:/temp/test.c)STRING_LITERAL
       |   |   |  $VOID$ [Child 5]
       |   |   | )#4a7f300
       |   |   | (STRING_LITERAL@C~GCC4=1525#4a7f140^1#4a7f260:2[`r'] Line 9 Column 25 File C:/temp/test.c)STRING_LITERAL
       |   |   |)expression_list#4a7f260
       |   |   )postfix_expression#4a7f2e0
       |   |  )assignment_expression#4a7f3c0
       |   | )expression_statement#4a7f4a0
       |   | (simple_declaration@C~GCC4=630#4a7f480^1#4a77d40:2 Line 9 Column 5 File C:/temp/test.c
       |   |  (init_declarator@C~GCC4=835#4a7f400^1#4a7f480:1 Line 9 Column 5 File C:/temp/test.c
       |   |   (IDENTIFIER@C~GCC4=1531#4a7eee0^2... [ALREADY PRINTED] ...)
       |   |   (initializer@C~GCC4=983#4a7f3e0^1#4a7f400:2 Line 9 Column 7 File C:/temp/test.c
       |   |   |(postfix_expression@C~GCC4=201#4a7f160^1#4a7f3e0:1 Line 9 Column 9 File C:/temp/test.c
       |   |   | (IDENTIFIER@C~GCC4=1531#4a7f120^2... [ALREADY PRINTED] ...)
       |   |   | (expression_list@C~GCC4=228#4a7f260^2... [ALREADY PRINTED] ...)
       |   |   |)postfix_expression#4a7f160
       |   |   )initializer#4a7f3e0
       |   |  )init_declarator#4a7f400
       |   | )simple_declaration#4a7f480
       |   |)AMBIGUITY#4a77d40
       |   |(else_directive@C~GCC4=1091#4a7f4c0^1#4a7f840:3 Line 10 Column 1 File C:/temp/test.c
       |   | ('#'@C~GCC4=1548#4a7f500^1#4a7f4c0:1[Keyword:0] Line 10 Column 1 File C:/temp/test.c)'#'
       |   | (new_line@C~GCC4=1578#4a7f4e0^1#4a7f4c0:2[Keyword:0] Line 10 Column 6 File C:/temp/test.c)new_line
       |   |)else_directive#4a7f4c0
       |   |(expression_statement@C~GCC4=503#4a7f7c0^1#4a7f840:4 Line 11 Column 5 File C:/temp/test.c
       |   | (postfix_expression@C~GCC4=201#4a77ba0^1#4a7f7c0:1 Line 11 Column 5 File C:/temp/test.c
       |   |  (IDENTIFIER@C~GCC4=1531#4a7f640^1#4a77ba0:1[`printf'] Line 11 Column 5 File C:/temp/test.c)IDENTIFIER
       |   |  (STRING_LITERAL@C~GCC4=1525#4a77c20^1#4a77ba0:2[`Unable to open file.
    '] Line 11 Column 12 File C:/temp/test.c)STRING_LITERAL
       |   | )postfix_expression#4a77ba0
       |   |)expression_statement#4a7f7c0
       |   |(endif_directive@C~GCC4=1092#4a7f7e0^1#4a7f840:5 Line 12 Column 1 File C:/temp/test.c
       |   | ('#'@C~GCC4=1548#4a7f720^1#4a7f7e0:1[Keyword:0] Line 12 Column 1 File C:/temp/test.c)'#'
       |   | (new_line@C~GCC4=1578#4a7f700^1#4a7f7e0:2[Keyword:0] Line 12 Column 7 File C:/temp/test.c)new_line
       |   |)endif_directive#4a7f7e0
       |   )statement#4a7f840
       |  )compound_statement#4a77ae0
       | )selection_statement#4a77b40
       |)statement_seq#4a77d20
       )compound_statement#4a77b20
      )function_definition#4a77be0
     )declaration_seq#4a77580
    )translation_unit#4a7e0e0
    

    You can see the preprocessor directives as "if_directive" on line 8.

    Yes, DMS can prettyprint this tree, too. The following command runs the parser to produce an AST, and then runs the DMS prettyprinter to regenerate source solely from the tree. The round-trip is accurate; you can recompile and get the same result. Comments are preserved, too.

    C:\DMS\Domains\C\GCC4\Tools\PrettyPrinter>run domainprettyprinter \temp\test.c
    C~GCC4 PrettyPrinter Version 1.2.13
    Copyright (C) 2004-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
    Powered by DMS (R) Software Reengineering Toolkit
    
    #define FILENAME "filename"
    #include 
    FILE *f;
    
    main()
    {
      f = 0;
      if (file_is_open)
        {
          #ifdef CAN_OPEN_IT
            f = fopen(FILENAME, "r");
          #else
            printf("Unable to open file.\n");
          #endif
        }
    }
    

    You can see how DMS handles C++. At this point it handles all of C++14 for GCC and MS dialects.

提交回复
热议问题