问题
I know that:
Lazy quantifier matches: As Few As Possible (shortest match)
Also know that the constructor:
basic_regex( ...,
flag_type f = std::regex_constants::ECMAScript );
And:ECMAScript
supports non-greedy matches,
and the ECMAScript
regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ...
en.cppreference
And:
At most one grammar option must be chosen out of ECMAScript
,
basic
, extended
, awk
, grep
, egrep
. If no grammar is chosen,
ECMAScript
is assumed to be selected ...
en.cppreference
And:
Note that regex_match
will only successfully match a regular expression to an entire character sequence, whereas std::regex_search
will successfully match subsequences...std::regex_match
Here is my code: + Live
#include <iostream>
#include <string>
#include <regex>
int main(){
std::string string( "s/one/two/three/four/five/six/g" );
std::match_results< std::string::const_iterator > match;
std::basic_regex< char > regex ( "s?/.+?/g?" ); // non-greedy
bool test = false;
using namespace std::regex_constants;
// okay recognize the lazy operator .+?
test = std::regex_search( string, match, regex );
std::cout << test << '\n';
std::cout << match.str() << '\n';
// does not recognize the lazy operator .+?
test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
std::cout << test << '\n';
std::cout << match.str() << '\n';
}
and the output:
1 s/one/ 1 s/one/two/three/four/five/six/g Process returned 0 (0x0) execution time : 0.008 s Press ENTER to continue.
std::regex_match
should not match anything and it should return 0
with non-greedy quantifier .+?
In fact, here, the non-greedy .+?
quantifier has the same meaning as greedy one, and both /.+?/
and /.+/
match the same string. They are different patterns.
So the problem is why the question mark is ignored?
regex101
Fast test:
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+?\/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+\/g?/ && print $&'
$ s/one/two/three/four/five/six/g
NOTE
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" );
non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" );
greedy
have the same output with std::regex_match
. Still both match the entire of the string!
But with std::regex_search
have the different output.
Also s?
or g?
does not matter and with /.*?/
still matches the entire of the string!
More Detail
g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901
回答1:
I don't see any inconsistency. regex_match
tries to match the whole string, so s?/.+?/g?
lazily expands till the whole string is covered.
These "diagrams" (for regex_search
) will hopefully help to get the idea of greediness:
Non-greedy:
a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".
Greedy:
a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa| # try .* == "baba" first
# backtrack
a.*|a: abab|a # try .* == "bab" now
a.*a|: ababa|
And regex_match( abc )
is like regex_search( ^abc$ )
in this case.
来源:https://stackoverflow.com/questions/42422778/stdregex-match-and-lazy-quantifier-with-strange-behavior