Here\'s something I\'m trying to do with regular expressions, and I can\'t figure out how. I have a big file, and strings abc
, 123
and xyz
You could use lookaround.
/^abc(?!.*abc).*123.*(?<!xyz.*)xyz$/g
(I've not tested it.)
When your left- and right-hand delimiters are single characters, it can be easily solved with negated character classes. So, if your match is between a
and c
and should not contain b
(literally), you may use (demo)
a[^abc]*c
This is the same technique you use when you want to make sure there is a b
in between the closest a
and c
(demo):
a[^abc]*b[^ac]*c
When your left- and right-hand delimiters are multi-character strings, you need a tempered greedy token:
abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz
See the regex demo
To make sure it matches across lines, use re.DOTALL
flag when compiling the regex.
Note that to achieve a better performance with such a heavy pattern, you should consider unrolling it. It can be done with negated character classes and negative lookaheads.
Pattern details:
abc
- match abc
(?:(?!abc|xyz|123).)*
- match any character that is not the starting point for a abc
, xyz
or 123
character sequences123
- a literal string 123
(?:(?!abc|xyz).)*
- any character that is not the starting point for a abc
or xyz
character sequencesxyz
- a trailing substring xyz
See the diagram below (if re.S
is used, .
will mean AnyChar
):
See the Python demo:
import re
p = re.compile(r'abc(?:(?!abc|xyz|123).)*123(?:(?!abc|xyz).)*xyz', re.DOTALL)
s = "abc 123 xyz\nabc abc 123 xyz\nabc text 123 xyz\nabc text xyz xyz"
print(p.findall(s))
// => ['abc 123 xyz', 'abc 123 xyz', 'abc text 123 xyz']
Using PCRE a solution would be:
This using m
flag. If you want to check only from start and end of a line add ^
and $
at beginning and end respectively
abc(?!.*(abc|xyz).*123).*123(?!.*(abc|xyz).*xyz).*xyz
Debuggex Demo
The comment by hvd is quite appropriate, and this just provides an example. In SQL, for instance, I think it would be clearer to do:
where val like 'abc%123%xyz' and
val not like 'abc%abc%' and
val not like '%xyz%xyz'
I imagine something quite similar is simple to do in other environments.