Capture , if between substring “”

时间秒杀一切 提交于 2019-12-02 13:09:24

Conceptually, I think you are not thinking this correctly.

Lookarounds are centric about where the current search Position is.

In your lookahead, you are negatively matching the comma in an expression before it even finds the comma.

This is called overlap.

Lookahead's usually are inserted after a matched subexpession has been consumed, the position
increases then the assertion is checked.

Likewise, Lookbehinds typically go before the object subexpression.

So, your regex is actually this

 ,
 (?!
      ( [^"]* " [^"]* " )*
      [^"]* $ 
 ) 

When you do this, you can easily see that after removing [^"]*$
this ( [^"]* " [^"]* " )* matches at every point in the string.
Because it is optional.

If you were to change it to ( [^"]* " [^"]* " )+ then it would find
something concrete to negatively match against.
The $ was serving that purpose before.

Hope you have a better understanding now.

  • [^"]*" matches any number of characters except quotes, followed by a quote.
  • .*?" matches any number of characters, including quotes, followed by a quote.

Now the ? in the second regex makes the * quantifier lazy, which means that it asks it nicely to match as few characters as possible to make the match happen. Therefore, in the string abc"def", both regexes will match the same text. So far, so good.

The problem is now that you've enclosed that regex in a negative lookahead assertion which has to make sure that the regex inside it is impossible to match. Since the dot may also match a quote if it has to, it will do so in order to make a match possible, and that will cause the lookahead to fail unless there are only two quotes left in the string.

For your second question, [^"]*$ makes sure that other characters besides quotes are allowed to appear at the end of the string.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!