regex negative look around with 2 adjacent matches

后端 未结 2 2039
独厮守ぢ
独厮守ぢ 2020-12-21 05:02

Should be an easy question from someone out there:

If I run this JavaScript:

var regex = new RegExp(\"(?!cat)dog(?!cat)\",\"g\");
va         


        
相关标签:
2条回答
  • 2020-12-21 05:21

    There are two problems with your approach.

    1. Your first lookahead needs to be a lookbehind. When you write (?!cat), the engine checks that the next three characters are cat and then resets to where it started (that's how it looks ahead), and then you try to match dog at those same three characters. Therefore, the lookahead doesn't add anything: if you can match dog you obviously can't match cat at the same position. What you want is a lookbehind (?<!cat) that checks that the preceding characters are not cat. Unfortunately, JavaScript doesn't support lookbehind.
    2. You want to logically OR the two lookarounds. In your case, if either lookaround fails, it causes the pattern fail. Hence both requirements (of not having cat at either end) need to be fulfilled. But you actually want to OR that. If lookbehinds were supported that would rather look like (?<!cat)dog|dog(?!cat) (note that the alternation splits the entire pattern apart). But as I said, lookbehinds are not supported. The reason why you seemd to have *OR*ed the two lookarounds in your first catdogdog bit is that the preceding cat was simply not checked (see point 1).

    How to work around lookbehinds then? Kolink's answer suggests (?!cat)...dog, which puts the lookaround at the position where a cat would start, and uses a lookahead. This has two new problems: it cannot match a dog at the beginning of the string (because the three characters in front are required. And it cannot match two consecutive dogs because matches cannot overlap (after matching the first dog, the engine requires three new characters which ..., which would consume the next dog before actually matching dog again).

    Sometimes you can work around it by reverse both pattern and string, hence turning the lookbehind into a lookahead - but in your case that would turn the lookahead at the end into a lookbehind.

    The regex-only solution

    We have to be a bit cleverer. Since matches cannot overlap, we could try to match catdogcat explicitly, without replacing it (hence skipping them in the target string), and then just replace all dogs we find. We put the two cases in an alternation, so they are both tried at every position in the string (with the catdogcat option taking precedence, although it doesn't really matter here). The problem is how to get conditional replacement strings. But let's look at what we've got so far:

    text.replace(/(catdog)(?=cat)|dog/g, "$1[or 000 if $1 didn't match]")
    

    So in the first alternative we match a catdog and capture it into group 1 and check that there is another cat following. In the replacement string we simply write the $1 back. The beauty is, if the second alternative matched, the first group will be unused and hence be an empty string the replacement. The reason why we only match catdog and use a lookahead instead of matching catdogcat right away is again overlapping matches. If we used catdogcat, then in the input catdogcatdogcat the first match would consume everything until and including the second cat, hence the second dog could not be recognized by the first alternative.

    Now the only question is, how do we get a 000 into the replacement, if we used the second alternative.

    Unfortunately, we can't conjure up conditional replacements that are not part of the input string. The trick is to add a 000 to the end of the input string, capture that in a lookahead if we find a dog, and then write that back:

    text.replace(/$/, "000")                            
        .replace(/(catdog)(?=cat)|dog(?=.*(000))/g, "$1$2")
        .replace(/000$/, "")
    

    The first replacement adds 000 to the end of the string.

    The second replacement matches either catdog (checking that another cat follows) and captures it into group 1 (leaving 2 empty) or matches dog and captures 000 into group 2 (leaving group 1 empty). Then we write $1$2 back, which will be either the unadorned catdog or 000.

    The third replacement gets rid of our extraneous 000 at the end of the string.

    The callback solution

    If you are not a fan of preparing the regex, and the lookahead in the second option, you can instead use a slightly simpler regex with a replacement callback:

    text.replace(/(catdog)(?=cat)|dog/g, function(match, firstGroup) {
        return firstGroup ? firstGroup : "000"
    })
    

    With the version of replace the supplied function gets called for each match and its return value is used as the replacement string. The functions first parameter is the entire match, the second parameter is the first capturing group (which will be undefined if the group doesn't participate in the match) and so on...

    So in the replacement callback we are free to conjure up our 000 if firstGroup is undefined (i.e. the dog option matched) or just return the firstGroup if it is present (i.e. the catdogcat option matched). This is a bit more concise and possibly easier to understand. However, the overhead of calling the function makes it significantly slower (although whether that matters depends on how often you want to do this). Pick your favorite!

    0 讨论(0)
  • 2020-12-21 05:34

    Your regex simplifies to dog(?!cat) (because the first lookbehind consumes nothing), so it is replacing any instance of dog that is not followed by cat.

    Try the regex (?!cat).{3}dog(?!cat)

    0 讨论(0)
提交回复
热议问题