How can I strip multiline C comments from a file using Perl?

前端 未结 6 1137
情话喂你
情话喂你 2020-12-03 04:02

Can anyone get me with the regular expression to strip multiline comments and single line comments in a file?

eg:

                  \" WHOLE         


        
相关标签:
6条回答
  • 2020-12-03 04:37

    From perlfaq6 "How do I use a regular expression to strip C style comments from a file?":


    While this actually can be done, it's much harder than you'd think. For example, this one-liner

    perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
    

    will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis.

    $/ = undef;
    $_ = <>;
    s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
    print;
    

    This could, of course, be more legibly written with the /x modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis.

    s{
       /\*         ##  Start of /* ... */ comment
       [^*]*\*+    ##  Non-* followed by 1-or-more *'s
       (
         [^/*][^*]*\*+
       )*          ##  0-or-more things which don't start with /
                   ##    but do end with '*'
       /           ##  End of /* ... */ comment
    
     |         ##     OR  various things which aren't comments:
    
       (
         "           ##  Start of " ... " string
         (
           \\.           ##  Escaped char
         |               ##    OR
           [^"\\]        ##  Non "\
         )*
         "           ##  End of " ... " string
    
       |         ##     OR
    
         '           ##  Start of ' ... ' string
         (
           \\.           ##  Escaped char
         |               ##    OR
           [^'\\]        ##  Non '\
         )*
         '           ##  End of ' ... ' string
    
       |         ##     OR
    
         .           ##  Anything other char
         [^/"'\\]*   ##  Chars which doesn't start a comment, string or escape
       )
     }{defined $2 ? $2 : ""}gxse;
    

    A slight modification also removes C++ comments, possibly spanning multiple lines using a continuation character:

     s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
    
    0 讨论(0)
  • 2020-12-03 04:39

    As often in Perl, you can reach for the CPAN: Regexp::Common::Comment should help you. The one language I found that uses the comments you described is Nickle, but maybe PHP comments would be OK (// can also start a single-line comment).

    Note that in any case, using regexps to strip out comment is dangerous, a full-parser for the language is much less risky. A regexp-parser for example is likely to get confused by something like print "/*";.

    0 讨论(0)
  • 2020-12-03 04:51

    There is also a non-perl answer: use the program stripcmt:

    StripCmt is a simple utility written in C to remove comments from C, C++, and Java source files. In the grand tradition of Unix text processing programs, it can function either as a FIFO (First In - First Out) filter or accept arguments on the commandline.

    0 讨论(0)
  • 2020-12-03 04:51

    Remove /* */ comments (including multi-line)

    s/\/\*.*?\*\///gs
    

    I post this because it is simple, however I believe it will trip up on embedded comments like

    /* sdafsdfsdf /*sda asd*/ asdsdf */
    

    But as they are fairly uncommon I prefer the simple regex.

    0 讨论(0)
  • 2020-12-03 04:52

    Including tests:

    use strict;
    use warnings;
    use Test::More qw(no_plan);
    sub strip_comments {
      my $string=shift;
      $string =~ s#/\*.*?\*/##sg; #strip multiline C comments
      return $string;
    }
    is(strip_comments('a/* comment1 */  code   /* comment2 */b'),'a  code   b');
    is(strip_comments('a/* comment1 /* comment2 */b'),'ab');
    is(strip_comments("a/* comment1\n\ncomment */ code /* comment2 */b"),'a code b');
    
    0 讨论(0)
  • 2020-12-03 04:53

    This is a FAQ:

    perldoc -q comment
    

    Found in perlfaq6:

    How do I use a regular expression to strip C style comments from a file?

    While this actually can be done, it's much harder than you'd think. For example, this one-liner ...

    0 讨论(0)
提交回复
热议问题