Can anyone get me with the regular expression to strip multiline comments and single line comments in a file?
eg:
\" WHOLE
From perlfaq6 "How do I use a regular expression to strip C style comments from a file?":
While this actually can be done, it's much harder than you'd think. For example, this one-liner
perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c
will work in many but not all cases. You see, it's too simple-minded for certain kinds of C programs, in particular, those with what appear to be comments in quoted strings. For that, you'd need something like this, created by Jeffrey Friedl and later modified by Fred Curtis.
$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;
This could, of course, be more legibly written with the /x modifier, adding whitespace and comments. Here it is expanded, courtesy of Fred Curtis.
s{
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment
| ## OR various things which aren't comments:
(
" ## Start of " ... " string
(
\\. ## Escaped char
| ## OR
[^"\\] ## Non "\
)*
" ## End of " ... " string
| ## OR
' ## Start of ' ... ' string
(
\\. ## Escaped char
| ## OR
[^'\\] ## Non '\
)*
' ## End of ' ... ' string
| ## OR
. ## Anything other char
[^/"'\\]* ## Chars which doesn't start a comment, string or escape
)
}{defined $2 ? $2 : ""}gxse;
A slight modification also removes C++ comments, possibly spanning multiple lines using a continuation character:
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
As often in Perl, you can reach for the CPAN: Regexp::Common::Comment should help you. The one language I found that uses the comments you described is Nickle, but maybe PHP comments would be OK (// can also start a single-line comment).
Note that in any case, using regexps to strip out comment is dangerous, a full-parser for the language is much less risky. A regexp-parser for example is likely to get confused by something like print "/*";
.
There is also a non-perl answer: use the program stripcmt:
StripCmt is a simple utility written in C to remove comments from C, C++, and Java source files. In the grand tradition of Unix text processing programs, it can function either as a FIFO (First In - First Out) filter or accept arguments on the commandline.
Remove /* */ comments (including multi-line)
s/\/\*.*?\*\///gs
I post this because it is simple, however I believe it will trip up on embedded comments like
/* sdafsdfsdf /*sda asd*/ asdsdf */
But as they are fairly uncommon I prefer the simple regex.
Including tests:
use strict;
use warnings;
use Test::More qw(no_plan);
sub strip_comments {
my $string=shift;
$string =~ s#/\*.*?\*/##sg; #strip multiline C comments
return $string;
}
is(strip_comments('a/* comment1 */ code /* comment2 */b'),'a code b');
is(strip_comments('a/* comment1 /* comment2 */b'),'ab');
is(strip_comments("a/* comment1\n\ncomment */ code /* comment2 */b"),'a code b');
This is a FAQ:
perldoc -q comment
Found in perlfaq6:
How do I use a regular expression to strip C style comments from a file?
While this actually can be done, it's much harder than you'd think. For example, this one-liner ...