Regular expression to remove comment

后端 未结 5 2014
轮回少年
轮回少年 2021-02-07 14:49

I am trying to write a regular expression which finds all the comments in text. For example all between /* */. Example:

/* Hello */

相关标签:
5条回答
  • 2021-02-07 14:51

    I encountered this problem several years ago and wrote an entire article about it.

    If you don't have access to non-greedy matching (not all regex libraries support non-greedy) then you should use this regex:

    /\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/
    

    If you do have access to non-greedy matching then you can use:

    /\*(.|[\r\n])*?\*/
    

    Also, keep in mind that regular expressions are just a heuristic for this problem. Regular expressions don't support cases in which something appears to be a comment to the regular expression but actually isn't:

    someString = "An example comment: /* example */";
    
    // The comment around this code has been commented out.
    // /*
    some_code();
    // */
    
    0 讨论(0)
  • 2021-02-07 14:54

    Just an additionnal note about using regex to remove comments inside a programming language file.

    Warning!

    Doing this you must not forget the case where you have the string /* or */ inside a string in the code - like var string = "/*"; - (we never know if you parse a huge code that is not yours)!

    So the best is to parse the document with a programming language and have a boolean to save the state of an open string (and ignore any match inside open string).

    Again a string delimited by " can contain a \" so pay attention with the regex!

    0 讨论(0)
  • 2021-02-07 14:59

    Just want to add for HTML Comments is is this

    \<!--(.|\n)*?-->
    
    0 讨论(0)
  • 2021-02-07 15:15

    Unlike the example posted above, you were trying to match comments that spanned multiple lines. By default, . does not match a line break. Thus you have to enable multi-line mode in the regex to match multi-line comments.

    Also, you probably need to use .*? instead of .*. Otherwise it will make the largest match possible, which will be everything between the first open comment and the last close comment.

    I don't know how to enable multi-line matching mode in Sublime Text 2. I'm not sure it is available as a mode. However, you can insert a line break into the actual pattern by using CTRL + Enter. So, I would suggest this alternative:

    /\*(.|\n)*?\*/
    

    If Sublime Text 2 doesn't recognize the \n, you could alternatively use CTRL + Enter to insert a line break in the pattern, in place of \n.

    0 讨论(0)
  • 2021-02-07 15:17

    The right answer - it is impossible. You cannot write a regular expression that would be able to correctly find all comments, or even one type of comments - single-line or multiline.

    Regular expressions can only provide a partial match, one that would would cover perhaps 90% of all cases, but that's it.

    The syntax for regular expression is so complex, it is only possible to identify them correctly in 100% of cases by doing a full expression evaluation, which in turn is based on tokenizing the code. The latter is a huge task, which is implemented by all AST parsers today. See AST Explorer

    Only a proper-written AST parser can tell you precisely where all regular expressions are located in your code. You would have to write a parser then based on that.

    Or, you could use one of the existing libraries that already do all that, like decomment.


    RegEx examples where any head-on approach is going to stumble, being unable to tell a regular expression from a comment block:

    • /\// - it will think this reg-ex is a single-line comment
    • /\/*/ - it will think this reg-ex opens a multi-line comment
    0 讨论(0)
提交回复
热议问题