Using escaped_list_separator with boost split

感情迁移 提交于 2019-12-06 05:26:03

问题


I am playing around with the boost strings library and have just come across the awesome simplicity of the split method.

  string delimiters = ",";
  string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\"";
  // If we didn't care about delimiter characters within a quoted section we could us
  vector<string> tokens;  
  boost::split(tokens, str, boost::is_any_of(delimiters));
  // gives the wrong result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters", " inside a quote\""}

Which would be nice and concise... however it doesn't seem to work with quotes and instead I have to do something like the following

string delimiters = ",";
string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\"";
vector<string> tokens; 
escaped_list_separator<char> separator("\\",delimiters, "\"");
typedef tokenizer<escaped_list_separator<char> > Tokeniser;
Tokeniser t(str, separator);
for (Tokeniser::iterator it = t.begin(); it != t.end(); ++it)
    tokens.push_back(*it);
// gives the correct result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters, inside a quote\""}

My question is can split or another standard algorithm be used when you have quoted delimiters? Thanks to purpledog but I already have a non-deprecated way of achieving the desired outcome, I just think that it's quite cumbersome and unless I could replace it with a simpler more elegant solution I wouldn't use it in general without first wrapping it in yet another method.

EDIT: Updated code to show results and clarify question.


回答1:


It doesn't seem that there is any simple way to do this using the boost::split method. The shortest piece of code I can find to do this is

vector<string> tokens; 
tokenizer<escaped_list_separator<char> > t(str, escaped_list_separator<char>("\\", ",", "\""));
BOOST_FOREACH(string s, escTokeniser)
    tokens.push_back(s);  

which is only marginally more verbose than the original snippet

vector<string> tokens;  
boost::split(tokens, str, boost::is_any_of(","));



回答2:


This will achieve the same result as Jamie Cook's answer without the explicit loop.

tokenizer<escaped_list_separator<char> >tok(str);
vector<string> tokens( tok.begin(), tok.end() );

The tokenizer constructor's second parameter defaults to escaped_list_separator<char>("\\", ",", "\"") so it's not necessary. Unless you have differing requirements for commas or quotes.




回答3:


I don't know about the boost::string library but using the boost regex_token_iterator you'll be able to express delimiters in terms of regular expression. So yes, you can use quoted delimiters, and far more complex things as well.

Note that this used to be done with regex_split which is now deprecated.

Here's an example taken from the boost doc:

#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
   string s;
   do{
      if(argc == 1)
      {
         cout << "Enter text to split (or \"quit\" to exit): ";
         getline(cin, s);
         if(s == "quit") break;
      }
      else
         s = "This is a string of tokens";

      boost::regex re("\\s+");
      boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
      boost::sregex_token_iterator j;

      unsigned count = 0;
      while(i != j)
      {
         cout << *i++ << endl;
         count++;
      }
      cout << "There were " << count << " tokens found." << endl;

   }while(argc == 1);
   return 0;
}

If the program is started with hello world as argument the output is:

hello
world
There were 2 tokens found.

Changing boost::regex re("\s+"); into boost::regex re("\",\""); would split quoted delimiters. starting the program with hello","world as argument would also result in:

hello
world
There were 2 tokens found.

But I suspect you want to deal with things like that: "hello", "world", in which case one solution is:

  1. split with coma only
  2. then remove the "" (possibly using boost/algorithm/string/trim.hpp or the regex library).

EDIT: added program output



来源:https://stackoverflow.com/questions/890895/using-escaped-list-separator-with-boost-split

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!