Should be an easy question from someone out there:
If I run this JavaScript:
var regex = new RegExp(\"(?!cat)dog(?!cat)\",\"g\");
va
There are two problems with your approach.
(?!cat)
, the engine checks that the next three characters are cat
and then resets to where it started (that's how it looks ahead), and then you try to match dog
at those same three characters. Therefore, the lookahead doesn't add anything: if you can match dog
you obviously can't match cat
at the same position. What you want is a lookbehind (?<!cat)
that checks that the preceding characters are not cat
. Unfortunately, JavaScript doesn't support lookbehind.cat
at either end) need to be fulfilled. But you actually want to OR that. If lookbehinds were supported that would rather look like (?<!cat)dog|dog(?!cat)
(note that the alternation splits the entire pattern apart). But as I said, lookbehinds are not supported. The reason why you seemd to have *OR*ed the two lookarounds in your first catdogdog
bit is that the preceding cat
was simply not checked (see point 1).How to work around lookbehinds then? Kolink's answer suggests (?!cat)...dog
, which puts the lookaround at the position where a cat
would start, and uses a lookahead. This has two new problems: it cannot match a dog
at the beginning of the string (because the three characters in front are required. And it cannot match two consecutive dog
s because matches cannot overlap (after matching the first dog
, the engine requires three new characters which ...
, which would consume the next dog
before actually matching dog
again).
Sometimes you can work around it by reverse both pattern and string, hence turning the lookbehind into a lookahead - but in your case that would turn the lookahead at the end into a lookbehind.
We have to be a bit cleverer. Since matches cannot overlap, we could try to match catdogcat
explicitly, without replacing it (hence skipping them in the target string), and then just replace all dog
s we find. We put the two cases in an alternation, so they are both tried at every position in the string (with the catdogcat
option taking precedence, although it doesn't really matter here). The problem is how to get conditional replacement strings. But let's look at what we've got so far:
text.replace(/(catdog)(?=cat)|dog/g, "$1[or 000 if $1 didn't match]")
So in the first alternative we match a catdog
and capture it into group 1
and check that there is another cat
following. In the replacement string we simply write the $1
back. The beauty is, if the second alternative matched, the first group will be unused and hence be an empty string the replacement. The reason why we only match catdog
and use a lookahead instead of matching catdogcat
right away is again overlapping matches. If we used catdogcat
, then in the input catdogcatdogcat
the first match would consume everything until and including the second cat
, hence the second dog
could not be recognized by the first alternative.
Now the only question is, how do we get a 000
into the replacement, if we used the second alternative.
Unfortunately, we can't conjure up conditional replacements that are not part of the input string. The trick is to add a 000
to the end of the input string, capture that in a lookahead if we find a dog
, and then write that back:
text.replace(/$/, "000")
.replace(/(catdog)(?=cat)|dog(?=.*(000))/g, "$1$2")
.replace(/000$/, "")
The first replacement adds 000
to the end of the string.
The second replacement matches either catdog
(checking that another cat
follows) and captures it into group 1
(leaving 2
empty) or matches dog
and captures 000
into group 2
(leaving group 1
empty). Then we write $1$2
back, which will be either the unadorned catdog
or 000
.
The third replacement gets rid of our extraneous 000
at the end of the string.
If you are not a fan of preparing the regex, and the lookahead in the second option, you can instead use a slightly simpler regex with a replacement callback:
text.replace(/(catdog)(?=cat)|dog/g, function(match, firstGroup) {
return firstGroup ? firstGroup : "000"
})
With the version of replace
the supplied function gets called for each match and its return value is used as the replacement string. The functions first parameter is the entire match, the second parameter is the first capturing group (which will be undefined
if the group doesn't participate in the match) and so on...
So in the replacement callback we are free to conjure up our 000
if firstGroup
is undefined (i.e. the dog
option matched) or just return the firstGroup
if it is present (i.e. the catdogcat
option matched). This is a bit more concise and possibly easier to understand. However, the overhead of calling the function makes it significantly slower (although whether that matters depends on how often you want to do this). Pick your favorite!
Your regex simplifies to dog(?!cat)
(because the first lookbehind consumes nothing), so it is replacing any instance of dog
that is not followed by cat
.
Try the regex (?!cat).{3}dog(?!cat)