I am trying to write a regular expression which finds all the comments in text.
For example all between /* */
.
Example:
/* Hello */
I encountered this problem several years ago and wrote an entire article about it.
If you don't have access to non-greedy matching (not all regex libraries support non-greedy) then you should use this regex:
/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/
If you do have access to non-greedy matching then you can use:
/\*(.|[\r\n])*?\*/
Also, keep in mind that regular expressions are just a heuristic for this problem. Regular expressions don't support cases in which something appears to be a comment to the regular expression but actually isn't:
someString = "An example comment: /* example */";
// The comment around this code has been commented out.
// /*
some_code();
// */
Just an additionnal note about using regex to remove comments inside a programming language file.
Warning!
Doing this you must not forget the case where you have the string /*
or */
inside a string in the code - like var string = "/*";
- (we never know if you parse a huge code that is not yours)!
So the best is to parse the document with a programming language and have a boolean to save the state of an open string (and ignore any match inside open string).
Again a string delimited by "
can contain a \"
so pay attention with the regex!
Just want to add for HTML Comments is is this
\<!--(.|\n)*?-->
Unlike the example posted above, you were trying to match comments that spanned multiple lines. By default, .
does not match a line break. Thus you have to enable multi-line mode in the regex to match multi-line comments.
Also, you probably need to use .*?
instead of .*
. Otherwise it will make the largest match possible, which will be everything between the first open comment and the last close comment.
I don't know how to enable multi-line matching mode in Sublime Text 2. I'm not sure it is available as a mode. However, you can insert a line break into the actual pattern by using CTRL + Enter. So, I would suggest this alternative:
/\*(.|\n)*?\*/
If Sublime Text 2 doesn't recognize the \n
, you could alternatively use CTRL + Enter to insert a line break in the pattern, in place of \n
.
The right answer - it is impossible. You cannot write a regular expression that would be able to correctly find all comments, or even one type of comments - single-line or multiline.
Regular expressions can only provide a partial match, one that would would cover perhaps 90% of all cases, but that's it.
The syntax for regular expression is so complex, it is only possible to identify them correctly in 100% of cases by doing a full expression evaluation, which in turn is based on tokenizing the code. The latter is a huge task, which is implemented by all AST parsers today. See AST Explorer
Only a proper-written AST parser can tell you precisely where all regular expressions are located in your code. You would have to write a parser then based on that.
Or, you could use one of the existing libraries that already do all that, like decomment.
RegEx examples where any head-on approach is going to stumble, being unable to tell a regular expression from a comment block:
/\//
- it will think this reg-ex is a single-line comment/\/*/
- it will think this reg-ex opens a multi-line comment