Regex multiline mode with optional group skip valid data

Deadly 提交于 2020-01-30 08:45:26

问题


Consider next example:

$payload = '
ababaaabbb =%=
ababaaabbb =%=
ababaa     =%=
';

$pattern = '/^[ab]+\s*(?:=%=)?$/m';
preg_match_all($pattern, $payload, $matches);
var_dump($matches);

Expected and actual result of match is:

"ababaaabbb =%="
"ababaaabbb =%="
"ababaa     =%="

But if $payload changed to

$payload = '
ababaaabbb =%=
ababaaabbb =%=
ababaa     =%'; // "=" sign removed at EOL

actual result is

"ababaaabbb =%="
"ababaaabbb =%="

but expected is

"ababaaabbb =%="
"ababaaabbb =%="
"ababaa     "

Why this happen? Group (?:=%=)? is optional due to ? and last string in payload should be also present in match results.


回答1:


Have a look at your current regex graph:

The =%= is optional (see how the branch between white space and End of line forks), but the EOL is required. That means after one or more a or b symbols, and zero or more whitespaces, EOL must occur. However, you have =% on your 3rd line => NO MATCH.

Now, when you move the $ anchor into the optional group:

The end of line is now optional, too, and the match will be returned after matching 1+ a or b chars and optional whitespaces.




回答2:


Since last line is ending with =%, you should make last = optional as well and use a capturing group for your expected data:

/^([ab]+\s*)(?:=%=?)?$/m

RegEx Demo

PS: Your expected result is available in captured group #1




回答3:


The group (?:=%=)? is optional in your regular expression. That does not mean each part of that group is also optional.

Your regex works only if it sees a string of as and bs, optional whitespace, then either (1) =%= and the end of the line or (2) just the end of the line. It will not work if it sees a string of as and bs, whitespace, then anything other than exactly =%= or the end of the line. So, =% won't work.

To accomplish what you apparently want to do, you need to make the second = optional, like so:

$pattern = '/^[ab]+\s*(?:=%=?)?$/m';
// see the additional ? here^

But it seems like you don't want the =% at all in this scenario, which means you need to get more creative still:

$pattern = '/^[ab]+\s*(?:(?:=%=)?$|(?==%$))/m';

Demo.



来源:https://stackoverflow.com/questions/42121396/regex-multiline-mode-with-optional-group-skip-valid-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!