std::regex_match and lazy quantifier with strange behavior

↘锁芯ラ 提交于 2019-12-20 06:28:41

问题


I know that:
Lazy quantifier matches: As Few As Possible (shortest match)

Also know that the constructor:

basic_regex( ...,
            flag_type f = std::regex_constants::ECMAScript );

And:
ECMAScript supports non-greedy matches,
and the ECMAScript regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ... en.cppreference

And:
At most one grammar option must be chosen out of ECMAScript, basic, extended, awk, grep, egrep. If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference

And:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences...std::regex_match


Here is my code: + Live

#include <iostream>
#include <string>
#include <regex>

int main(){

        std::string string( "s/one/two/three/four/five/six/g" );
        std::match_results< std::string::const_iterator > match;
        std::basic_regex< char > regex ( "s?/.+?/g?" );  // non-greedy
        bool test = false;

        using namespace std::regex_constants;

        // okay recognize the lazy operator .+?
        test = std::regex_search( string, match, regex );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
        // does not recognize the lazy operator .+?
        test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
} 

and the output:

1
s/one/
1
s/one/two/three/four/five/six/g

Process returned 0 (0x0)   execution time : 0.008 s
Press ENTER to continue.

std::regex_match should not match anything and it should return 0 with non-greedy quantifier .+?

In fact, here, the non-greedy .+? quantifier has the same meaning as greedy one, and both /.+?/ and /.+/ match the same string. They are different patterns. So the problem is why the question mark is ignored?

regex101

Fast test:

$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+?\/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+\/g?/ && print $&'
$ s/one/two/three/four/five/six/g

NOTE
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" ); non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" ); greedy
have the same output with std::regex_match. Still both match the entire of the string!
But with std::regex_search have the different output.
Also s? or g? does not matter and with /.*?/ still matches the entire of the string!

More Detail

g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901

回答1:


I don't see any inconsistency. regex_match tries to match the whole string, so s?/.+?/g? lazily expands till the whole string is covered.

These "diagrams" (for regex_search) will hopefully help to get the idea of greediness:

Non-greedy:

a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba  # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba  # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".

Greedy:

a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa|  # try .* == "baba" first
# backtrack
a.*|a: abab|a  # try .* == "bab" now
a.*a|: ababa|

And regex_match( abc ) is like regex_search( ^abc$ ) in this case.



来源:https://stackoverflow.com/questions/42422778/stdregex-match-and-lazy-quantifier-with-strange-behavior

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!