I found this regex code that finds comments in w3.org\'s CSS grammar page.
\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/
Token by token explanation:
\/ <- an escaped '/', matches '/'
\* <- an escaped '*', matches '*'
[^*]* <- a negated character class with quantifier, matches anything but '*' zero or more times
\*+ <- an escaped '*' with quantifier, matches '*' once or more
( <- beginning of group
[^/*] <- negated character class, matches anything but '/' or '*' once
[^*]* <- negated character class with quantifier, matches anything but '*' zero or more times
\*+ <- escaped '*' with quantifier, matches '*' once or more
)* <- end of group with quantifier, matches group zero or more times
\/ <- an escaped '/', matches '/'
Regex Reference
Analysis on Regexper.com
The reason yours finds only single line comments is that, in typical regular expressions, .
matches anything except newlines; whereas the other one uses a negated character class which matches anything but the specified characters, and so can match newlines.
However, if you were to fix that (there's usually an option for multiline or "as if single line" matching), you would find that it would match from the /*
of the first comment to the */
of the last comment; you would have to use a non-greedy quantifier, .*?
, to match no more than one comment.
However, the more complex regular expression you give is even more complex than that. Based on nikc.org's answer, I believe it is to enforce the restriction that “comments may not be nested”; that is, they must not contain /*
within them. In other languages which permit comments /* like /* this */
(that is, an internal /* is neither prohibited nor a nested comment), the pattern \/\*.*?\*\/
would be appropriate to match them.