Simplifying the regex “ab|a|b”

醉酒当歌 提交于 2019-12-01 22:55:39

问题


(How) could the following regex be simplified:

ab|a|b

?

I'm looking for a less redundant one, i.e. with only one a and one b. Is it possible?

Some tries:

a?b?       # matches empty string while shouldn't
ab?|b      # still two b

Note that the real regex has more complicated a and b parts, i.e. not a single char but inner subregexes let's say.


回答1:


If you are using Perl or some PCRE engine (like PHP's preg_ functions), you can refer to previous groups in the pattern, like this:

/(a)(b)|(?1)|(?2)/

The main purpose of this feature is to support recursion, but it can be used for pattern reuse as well.

Note that in this case you cannot get around capturing a and b in the first alternation, which incurs some (possibly) unnecessary overhead. To avoid this, you can define the groups inside a conditional that is never executed. The canonical way to do this is to use (?(DEFINE)...) group (which checks if a named DEFINE group matched anything, but of course that group doesn't exist):

/(?(DEFINE)(a)(b))(?1)(?2)|(?1)|(?2)/

If your engine doesn't support that (EDIT: since you are using Java, no this feature is not supported), the best you can get in a single pattern is indeed

ab?|b

Alternatively, you can build the ab|a|b version manually by string concatenation/formatting like:

String a = "a";
String b = "b";
String pattern = a + b + "|" + a + "|" + b;

This avoids the duplication as well. Or you can use 3 separate patterns ab, a and b against the subject string (where the first one is again a concatenation of the latter two).



来源:https://stackoverflow.com/questions/16217375/simplifying-the-regex-abab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!