Combining two regular expressions A and B into C = (A and not B)

强颜欢笑 提交于 2019-12-24 04:24:09

问题


Let's say I have one regular expression A and another regular expression B as input. I want to create a new regular expression C which matches a line if and only if

  • A matches the line and
  • B does not match the line.

I am able to manually create C for very simple cases of A and B: Let's say A is x and B is y, then C = ^[^y]*x[^y]*$ would be a valid solution.

Obviously, the problem gets harder as A and B get more complex. Is there a generic algorithm for creating such a regular expression C out of A and B?


Note: Since regular languages are closed under intersection and complement, such an algorithm should theoretically exist. I am aware that the expressive power of regular expressions available in modern IT systems exceeds that of formal regular languages, but a solution where A and B are restricted to the subset of features available in formal languages, but C uses extended features of modern-day regex engines, is perfectly fine for me.


回答1:


Edit

Based on the OP's initial regex and as pointed out by @ruakh in the comments below my answer, the OP has chosen to use ^(?!.*B).*A. This solution matches any strings that contain B, rather than what my original post (below) targeted, which is any string that matches B as was originally assumed and later clarified (in the comments below my answer) by the OP.


Original Post

If I understand your question correctly, you're looking to match a string that matches one given pattern A, but not match pattern B, such that your new pattern C is comprised of both A and B.

Simple regex

Given that the pattern A is x and the pattern B is y, the new regex pattern C should be as follows:

^(?!B$)A$

or with the sample regex you presented:

^(?!y$)x$

Maybe a better example to demonstrate this is with the following:

  • A pattern: x.
  • B pattern: xx
  • C becomes: ^(?!xx$)x.$

This would match xa but not xx as seen here


Complex regex

With regards to more complex regular expressions, it might depend on the patterns entirely and the regex engine that is used. The regular expression could time out and if recursion, control verbs, pattern modifiers, etc. are used, it could break entirely.

A better option would be to evaluate both regular expressions independently with code to determine the outcome.

Example 1

Here's an example where the regular expression breaks given that both patterns use the same predefined pattern name:

  • A: (?(DEFINE)(?<t>x))(?&t).
  • B: (?(DEFINE)(?<t>x))(?&t){2}
  • C: ^(?!(?(DEFINE)(?<t>x))(?&t){2}$)(?(DEFINE)(?<t>x))(?&t).$

It fails as shown here

Example 2

Here's a recursion example that fails to work properly:

  • A: a(?R)z
  • B: az
  • ^(?!az$)a(?R)?z$

It fails as shown here


Of course, this assumes that the initial assumption that C: ^(?!B$)A$ is the correct pattern to use for the matching of A and non-matching of B.




回答2:


I'm guessing that the answer is most likely no because A, B, and C can be dependent and independent expressions, then the outcomes would fall into combination category, which also includes the permutation instances and there would be an infinite number of such expressions. Then, I highly doubt that there would be one generic algorithm for.



来源:https://stackoverflow.com/questions/57098751/combining-two-regular-expressions-a-and-b-into-c-a-and-not-b

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!