Regular Expression for comments but not within a “string” / not in another container

这一生的挚爱 提交于 2020-01-04 09:33:50

问题


So I need a regular expression for finding single line and multi line comments, but not in a string. (eg. "my /* string")

for testing (# single line, /* & */ multi line):

# complete line should be found
lorem ipsum # from this to line end
/*
  all three lines should be found
*/ but not here anymore
var x = "this # should not be found"
var y = "this /* shouldn't */ match either"
var z = "but" & /* this must match */ "_"

SO does the syntax display really well; I basically want all the gray text.
I don't care if its a single regex or two separates. ;)

EDIT: one more thing. the opposite would also satisfy me, searching for a string which is not in a comment
this is my current string matching: "[\s\S]*?(?<!\\)" (indeed: will not work with "\\")

EDIT2:
OK finally I wrote my own comment parser -.-
And if someone else is interested in the source code, grab it from here: https://github.com/relikd/CommentParser


回答1:


Here's one possibility (it does have an achilles heel that i'll get to):

(#[^"\n\r]*(?:"[^"\n\r]*"[^"\n\r]*)*[\r\n]|/\*([^*]|\*(?!/))*?\*/)(?=[^"]*(?:"[^"]*"[^"]*)*$)

In action here

With the GLOBAL and DOTALL flags, but not the MULTILINE flag.

Explanation of the regex:

(
  #[^"\n\r]*                         Hash mark followed by non-" and non-end-of-line
    (?:"[^"\n\r]*"[^"\n\r]*)*        If any quotes in the comment, they must be balanced
    [\r\n]                           Followed by end-of-line ($ except we 
                                      don't have multiline flag)

  |                                  OR
  /\*([^*]|\*(?!/))*?\*/             /* xxx */ sort of comment
  )                                  BOTH FOLLOWED BY
(?=[^"]*(?:"[^"]*"[^"]*)*$)           only a *balanced* number of quotes for the 
                                      *rest of the code :O!*

However, this relies on balanced quotes being used throughout the text (it also doesn't take into account escaped quotes, but it's easy enough to modify the regex to take that into account).

If a user has a comment with a " in it that isn't balanced...boom. You're screwed!

Regex is generally not recommended by things like HTML/code parsing, but if you can rely on the fact that quotes have to balance when you define a string, etc, you can sometimes get away with it.

Since you are also parsing comments, which have no set structure (ie you are not guaranteed that quotes within comments will be balanced), you won't be able to find a regex solution that works here.

Anything you think up can be outwitted by an unbalanced quote in a comment somewhere (say the comment was # remove all the " marks), or by multiline strings (where on a given line there may be unbalanced quotes).

Bottom line - you can probably make a regex that will work in most cases, but not for all. To get something watertight you'll have to write some code.




回答2:


I would use two regular expressions for this:

  1. /(\/\*.*?\/)|(#.+?$)/m to find all the comments, the "m" modifier is to enable multiline
  2. /"[^"]*?"/ to find all the strings

If you apply the highlighting to the comments first and only after to the strings, the invalid comments should disappear.



来源:https://stackoverflow.com/questions/9203774/regular-expression-for-comments-but-not-within-a-string-not-in-another-conta

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!