C++11 regex: digit after capturing group in replacement string

∥☆過路亽.° 提交于 2019-12-30 08:08:48

问题


My regex_replace expression uses group $1 right before a '0' character in the replacement string like so:

#include <iostream>
#include <string>
#include <regex>

using namespace std;

int main() {
    regex regex_a( "(.*)bar(.*)" );
    cout << regex_replace( "foobar0x1", regex_a, "$10xNUM" ) << endl;
    cout << regex_replace( "foobar0x1", regex_a, "$1 0xNUM" ) << endl;
}

The output is:

xNUM
foo 0xNUM

I'm trying to get output foo0xNUM without the middle whitespace.

How do I guard the group name $1 from the next character in the substitution string?


回答1:


You are allowed to either specify $n or $nn to reference captured text, thus you can use the $nn format (here $01) to avoid grabbing the 0.

cout << regex_replace( "foobar0x1", regex_a, "$010xNUM" ) << endl;



回答2:


Guvante has provided a solution to this problem.

However, is the behavior well-defined according to the specification?

To start from the conclusion. Yes, the solution has well-defined behavior.

C++ specification

The documentation of format_default, which specifies ECMA rules to interpret the format string, points to Section 15.5.4.11 of ECMA-262.

ECMA-262 specification

According to Table 22 in Section 15.5.4.11 of ECMA-262 specification

$n

The nth capture, where n is a single digit in the range 1 to 9 and $n is not followed by a decimal digit. If n ≤ m and the nth capture is undefined, use the empty String instead. If n > m, the result is implementation-defined.

$nn

The nnth capture, where nn is a two-digit decimal number in the range 01 to 99. If nn ≤ m and the nnth capture is undefined, use the empty String instead. If nn > m, the result is implementation-defined.

The variable m is defined in previous paragraph in the same section:

[...] Let m be the number of left capturing parentheses in searchValue (using NcapturingParens as specified in 15.10.2.1).

Replacement string in the question "$10xNUM"

Back at the code in the question:

cout << regex_replace( "foobar0x1", regex_a, "$10xNUM" ) << endl;

Since $1 is followed by 0, it has to be interpreted as the second rule $nn, as the first rule forbids any digit to follow $n. However, since the pattern only has 2 capturing groups (m = 2) and 10 > 2, the behavior is implementation-defined according to the specification.

We can see the effect of the implementation-defined clause by comparing the result of functionally equivalent JavaScript code in Firefox 37.0.1:

> "foobar0x1".replace(/(.*)bar(.*)/g, "$10xNUM" )
< "foo0xNUM"

As you can see, Firefox decided to interpret $10 as taking the value of the first capturing group $1, then followed by the fixed string 0. This is a valid implementation according to the specification, under the condition in $nn clause.

Replacement string in Guvante's answer: "$010xNUM"

Same as above, $nn clause is used, since $n clause forbids any digit to follow. Since 01 in $01 is less than the number of capturing groups (m = 2), the behavior is well-defined, which is to use the content of capturing group 1 in the replacement.

Therefore, Guvante's answer will return the same result on any complaint C++ compiler.




回答3:


I tried to find a method of simply escaping the space or something so it wouldn't print, but I was unable to.

However, the bit you are trying to add in, could be simply appended to the end of the regex output:

cout << regex_replace( "foobar0x1", regex_a, "$1" ) << "0xNUM" << endl;

The above line would give you the output you want.



来源:https://stackoverflow.com/questions/29809811/c11-regex-digit-after-capturing-group-in-replacement-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!