问题
I'm looking for a way to split string by multiple delimiters using regex in C++ but without losing the delimiters in output, keeping the delimiters with splitted parts in order, for example:
Input
aaa,bbb.ccc,ddd-eee;
Output
aaa , bbb . ccc , ddd - eee ;
I've found some solutions for this but all in C# or java, looking for some C++ solution, preferably without using Boost.
回答1:
You could build your solution on top of the example for regex_iterator. If, for example, you know your delimiters are comma, period, semicolon, and hyphen, you could use a regex that captures either a delimiter or a series of non-delimiters:
([.,;-]|[^.,;-]+)
Drop that into the sample code and you end up with something like this:
#include <iostream>
#include <string>
#include <regex>
int main ()
{
// the following two lines are edited; the remainder are directly from the reference.
std::string s ("aaa,bbb.ccc,ddd-eee;");
std::regex e ("([.,;-]|[^.,;-]+)"); // matches delimiters or consecutive non-delimiters
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend) {
std::cout << rit->str() << std::endl;
++rit;
}
return 0;
}
Try substituting in any other regular expressions you like.
回答2:
For your case, splitting your input string according to the word boundary \b
except the one at the first will give you the desired output.
(?!^)\b
DEMO
OR
(?<=\W)(?!$)|(?!^)(?=\W)
DEMO
(?<=\W)(?!$)
Matches the boundaries which exists next to a non-word character but not the boundary present at the last.|
OR(?!^)(?=\W)
Matches the boundary which is followed by a non-word character except the one at the start.
Escape the backslash one more time if necessary.
来源:https://stackoverflow.com/questions/27706443/c-spliting-string-by-delimiters-and-keeping-the-delimiters-in-result