问题
Let's say I have one regular expression A and another regular expression B as input. I want to create a new regular expression C which matches a line if and only if
- A matches the line and
- B does not match the line.
I am able to manually create C for very simple cases of A and B: Let's say A is x
and B is y
, then C = ^[^y]*x[^y]*$
would be a valid solution.
Obviously, the problem gets harder as A and B get more complex. Is there a generic algorithm for creating such a regular expression C out of A and B?
Note: Since regular languages are closed under intersection and complement, such an algorithm should theoretically exist. I am aware that the expressive power of regular expressions available in modern IT systems exceeds that of formal regular languages, but a solution where A and B are restricted to the subset of features available in formal languages, but C uses extended features of modern-day regex engines, is perfectly fine for me.
回答1:
Edit
Based on the OP's initial regex and as pointed out by @ruakh in the comments below my answer, the OP has chosen to use ^(?!.*B).*A
. This solution matches any strings that contain B
, rather than what my original post (below) targeted, which is any string that matches B
as was originally assumed and later clarified (in the comments below my answer) by the OP.
Original Post
If I understand your question correctly, you're looking to match a string that matches one given pattern A
, but not match pattern B
, such that your new pattern C
is comprised of both A
and B
.
Simple regex
Given that the pattern A
is x
and the pattern B
is y
, the new regex pattern C
should be as follows:
^(?!B$)A$
or with the sample regex you presented:
^(?!y$)x$
Maybe a better example to demonstrate this is with the following:
A
pattern:x.
B
pattern:xx
C
becomes:^(?!xx$)x.$
This would match xa
but not xx
as seen here
Complex regex
With regards to more complex regular expressions, it might depend on the patterns entirely and the regex engine that is used. The regular expression could time out and if recursion, control verbs, pattern modifiers, etc. are used, it could break entirely.
A better option would be to evaluate both regular expressions independently with code to determine the outcome.
Example 1
Here's an example where the regular expression breaks given that both patterns use the same predefined pattern name:
A
:(?(DEFINE)(?<t>x))(?&t).
B
:(?(DEFINE)(?<t>x))(?&t){2}
C
:^(?!(?(DEFINE)(?<t>x))(?&t){2}$)(?(DEFINE)(?<t>x))(?&t).$
It fails as shown here
Example 2
Here's a recursion example that fails to work properly:
A
:a(?R)z
B
:az
^(?!az$)a(?R)?z$
It fails as shown here
Of course, this assumes that the initial assumption that C
: ^(?!B$)A$
is the correct pattern to use for the matching of A
and non-matching of B
.
回答2:
I'm guessing that the answer is most likely no because A, B, and C can be dependent and independent expressions, then the outcomes would fall into combination category, which also includes the permutation instances and there would be an infinite number of such expressions. Then, I highly doubt that there would be one generic algorithm for.
来源:https://stackoverflow.com/questions/57098751/combining-two-regular-expressions-a-and-b-into-c-a-and-not-b