Using escaped_list_separator with boost split

夙愿已清 提交于 2019-12-04 10:54:11

It doesn't seem that there is any simple way to do this using the boost::split method. The shortest piece of code I can find to do this is

vector<string> tokens; 
tokenizer<escaped_list_separator<char> > t(str, escaped_list_separator<char>("\\", ",", "\""));
BOOST_FOREACH(string s, escTokeniser)
    tokens.push_back(s);  

which is only marginally more verbose than the original snippet

vector<string> tokens;  
boost::split(tokens, str, boost::is_any_of(","));

This will achieve the same result as Jamie Cook's answer without the explicit loop.

tokenizer<escaped_list_separator<char> >tok(str);
vector<string> tokens( tok.begin(), tok.end() );

The tokenizer constructor's second parameter defaults to escaped_list_separator<char>("\\", ",", "\"") so it's not necessary. Unless you have differing requirements for commas or quotes.

I don't know about the boost::string library but using the boost regex_token_iterator you'll be able to express delimiters in terms of regular expression. So yes, you can use quoted delimiters, and far more complex things as well.

Note that this used to be done with regex_split which is now deprecated.

Here's an example taken from the boost doc:

#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
   string s;
   do{
      if(argc == 1)
      {
         cout << "Enter text to split (or \"quit\" to exit): ";
         getline(cin, s);
         if(s == "quit") break;
      }
      else
         s = "This is a string of tokens";

      boost::regex re("\\s+");
      boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
      boost::sregex_token_iterator j;

      unsigned count = 0;
      while(i != j)
      {
         cout << *i++ << endl;
         count++;
      }
      cout << "There were " << count << " tokens found." << endl;

   }while(argc == 1);
   return 0;
}

If the program is started with hello world as argument the output is:

hello
world
There were 2 tokens found.

Changing boost::regex re("\s+"); into boost::regex re("\",\""); would split quoted delimiters. starting the program with hello","world as argument would also result in:

hello
world
There were 2 tokens found.

But I suspect you want to deal with things like that: "hello", "world", in which case one solution is:

  1. split with coma only
  2. then remove the "" (possibly using boost/algorithm/string/trim.hpp or the regex library).

EDIT: added program output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!