问题
Could someone please help me to extract the text between the : and the ^ symbols using a JavaScript (ECMAScript) regular expression in C++11. I do not need to capture the hw-descriptor
itself - but it does have to be present in the line in order for the rest of the line to be considered for a match. Also the :p....^
, :m....^
and :u....^
can arrive in any order and there has to be at least 1 present.
I tried using the following regular expression:
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
against the following text line:
"hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^"
Here is the code which posted on a live coliru. It shows how I attempted to solve this problem, however I am only getting 1 match. I need to see how to extract each of the potential 3 matches corresponding to the p m or u characters described earlier.
#include <iostream>
#include <string>
#include <vector>
#include <regex>
int main()
{
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^";
// I seem to only get 1 match here, I was expecting
// to loop through each of the matches, looks like I need something like
// a pcre global option but I don't know how.
std::for_each(std::sregex_iterator(foo.cbegin(), foo.cend(), gRegex), std::sregex_iterator(),
[&](const auto& rMatch) {
for (int i=0; i< static_cast<int>(rMatch.size()); ++i) {
std::cout << rMatch[i] << std::endl;
}
});
}
The above program gives the following output:
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
:uTEXT3^
TEXT3
回答1:
With std::regex
, you cannot keep mutliple repeated captures when matching a certain string with consecutive repeated patterns.
What you may do is to match the overall texts containing the prefix and the repeated chunks, capture the latter into a separate group, and then use a second smaller regex to grab all the occurrences of the substrings you want separately.
The first regex here may be
hw-descriptor((?::[pmu][^^]*\\^)+)
See the online demo. It will match hw-descriptor
and ((?::[pmu][^^]*\\^)+)
will capture into Group 1 one or more repetitions of :[pmu][^^]*\^
pattern: :
, p
/m
/u
, 0 or more chars other than ^
and then ^
. Upon finding a match, use :[pmu][^^]*\^
regex to return all the real "matches".
C++ demo:
static const std::regex gRegex("hw-descriptor((?::[pmu][^^]*\\^)+)", std::regex::icase);
static const std::regex lRegex(":[pmu][^^]*\\^", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^ hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^";
std::smatch smtch;
for(std::sregex_iterator i = std::sregex_iterator(foo.begin(), foo.end(), gRegex);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << std::endl;
std::string x = m.str(1);
for(std::sregex_iterator j = std::sregex_iterator(x.begin(), x.end(), lRegex);
j != std::sregex_iterator();
++j)
{
std::cout << "Element value: " << (*j).str() << std::endl;
}
}
Output:
Match value: hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
Element value: :pTEXT1^
Element value: :mTEXT2^
Element value: :uTEXT3^
Match value: hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^
Element value: :pTEXT8^
Element value: :mTEXT8^
Element value: :uTEXT83^
来源:https://stackoverflow.com/questions/39399371/c11-regex-matching-capturing-group-multiple-times