问题
My regex_replace expression uses group $1 right before a '0' character in the replacement string like so:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
regex regex_a( "(.*)bar(.*)" );
cout << regex_replace( "foobar0x1", regex_a, "$10xNUM" ) << endl;
cout << regex_replace( "foobar0x1", regex_a, "$1 0xNUM" ) << endl;
}
The output is:
xNUM
foo 0xNUM
I'm trying to get output foo0xNUM
without the middle whitespace.
How do I guard the group name $1 from the next character in the substitution string?
回答1:
You are allowed to either specify $n
or $nn
to reference captured text, thus you can use the $nn
format (here $01
) to avoid grabbing the 0
.
cout << regex_replace( "foobar0x1", regex_a, "$010xNUM" ) << endl;
回答2:
Guvante has provided a solution to this problem.
However, is the behavior well-defined according to the specification?
To start from the conclusion. Yes, the solution has well-defined behavior.
C++ specification
The documentation of format_default, which specifies ECMA rules to interpret the format string, points to Section 15.5.4.11 of ECMA-262.
ECMA-262 specification
According to Table 22 in Section 15.5.4.11 of ECMA-262 specification
$n
The nth capture, where n is a single digit in the range 1 to 9 and
$n
is not followed by a decimal digit. If n ≤ m and the nth capture is undefined, use the empty String instead. If n > m, the result is implementation-defined.
$nn
The nnth capture, where nn is a two-digit decimal number in the range 01 to 99. If nn ≤ m and the nnth capture is undefined, use the empty String instead. If nn > m, the result is implementation-defined.
The variable m is defined in previous paragraph in the same section:
[...] Let m be the number of left capturing parentheses in
searchValue
(usingNcapturingParens
as specified in 15.10.2.1).
Replacement string in the question "$10xNUM"
Back at the code in the question:
cout << regex_replace( "foobar0x1", regex_a, "$10xNUM" ) << endl;
Since $1
is followed by 0
, it has to be interpreted as the second rule $nn
, as the first rule forbids any digit to follow $n
. However, since the pattern only has 2 capturing groups (m = 2) and 10 > 2, the behavior is implementation-defined according to the specification.
We can see the effect of the implementation-defined clause by comparing the result of functionally equivalent JavaScript code in Firefox 37.0.1:
> "foobar0x1".replace(/(.*)bar(.*)/g, "$10xNUM" )
< "foo0xNUM"
As you can see, Firefox decided to interpret $10
as taking the value of the first capturing group $1
, then followed by the fixed string 0
. This is a valid implementation according to the specification, under the condition in $nn
clause.
Replacement string in Guvante's answer: "$010xNUM"
Same as above, $nn
clause is used, since $n
clause forbids any digit to follow. Since 01 in $01
is less than the number of capturing groups (m = 2), the behavior is well-defined, which is to use the content of capturing group 1 in the replacement.
Therefore, Guvante's answer will return the same result on any complaint C++ compiler.
回答3:
I tried to find a method of simply escaping the space or something so it wouldn't print, but I was unable to.
However, the bit you are trying to add in, could be simply appended to the end of the regex output:
cout << regex_replace( "foobar0x1", regex_a, "$1" ) << "0xNUM" << endl;
The above line would give you the output you want.
来源:https://stackoverflow.com/questions/29809811/c11-regex-digit-after-capturing-group-in-replacement-string