问题
I want to split my sentence using whitespace as my delimiter except for escaped whitespaces. Using boost::split and regex, how can I split it? If not possible, how else?
Example:
std::string sentence = "My dog Fluffy\\ Cake likes to jump";
Result:
My
dog
Fluffy\ Cake
likes
to
jump
回答1:
Three implementations:
- With Boost Spirit
- With Boost Regex
- Handwritten parser
With Boost Spirit
Here's how I'd do this with Boost Spirit. This might seem overkill, but experience teaches me that once you're splitting input text you will likely require more parsing logic.
Boost Spirit shines when you scale from "just splitting tokens" to a real grammar with production rules.
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
using It = std::string::const_iterator;
It f = sentence.begin(), l = sentence.end();
std::vector<std::string> words;
bool ok = qi::phrase_parse(f, l,
*qi::lexeme [ +('\\' >> qi::char_ | qi::graph) ], // words
qi::space - "\\ ", // skipper
words);
if (ok) {
std::cout << "Parsed:\n";
for (auto& w : words)
std::cout << "\t'" << w << "'\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
With Boost Regex
This looks really succinct but
- requires linking to boost_regex
- uses "black magic" negative look behind assertion: http://www.regular-expressions.info/lookaround.html
Live On Coliru
#include <iostream>
#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <vector>
int main() {
std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
std::vector<std::string> words;
boost::algorithm::split_regex(words, sentence, boost::regex("(?<!\\\\)\\s"), boost::match_default);
for (auto& w : words)
std::cout << " '" << w << "'\n";
}
Using c++11 raw literals you could write the regular expression slightly less obscurely:
boost::regex(R"((?<!\\)\s)")
, meaning "any whitespace not following a backslash"
Handwritten parser
This is somewhat more tedious, but like the Spirit grammar is completely generic, and allow nice performance.
However, it doesn't nearly scale as gracefully as the Spirit approach once you start adding complexity to your grammar. An advantage is that you spend less time compiling the code than with the Spirit version.
Live On Coliru
#include <iostream>
#include <iterator>
#include <vector>
template <typename It, typename Out>
Out tokens(It f, It l, Out out) {
std::string accum;
auto flush = [&] {
if (!accum.empty()) {
*out++ = accum;
accum.resize(0);
}
};
while (f!=l) {
switch(*f) {
case '\\':
if (++f!=l && *f==' ')
accum += ' ';
else
accum += '\\';
break;
case ' ': case '\t': case '\r': case '\n':
++f;
flush();
break;
default:
accum += *f++;
}
}
flush();
return out;
}
int main() {
std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
std::vector<std::string> words;
tokens(sentence.begin(), sentence.end(), back_inserter(words));
for (auto& w : words)
std::cout << "\t'" << w << "'\n";
}
来源:https://stackoverflow.com/questions/29380897/how-to-split-a-sentence-with-an-escaped-whitespace