I have this csv line
std::string s = R\"(1997,Ford,E350,\"ac, abs, moon\",\"some \"rusty\" parts\",3000.00)\";
I can parse it using
For a background on parsing (optionally) quoted delimited fields, including different quoting characters (
'
,"
), see here:
- Parse quoted strings with boost::spirit
For a very, very, very complete example complete with support for partially quoted values and a
splitInto(input, output, ' ');
method that takes 'arbitrary' output containers and delimiter expressions, see here:
- How to make my split work only on one real line and be capable to skip quoted parts of string?
Addressing your exact question, assuming either quoted or unquoted fields (no partial quotes inside field values), using Spirit V2:
Let's take the simplest 'abstract datatype' that could possibly work:
using Column = std::string;
using Columns = std::vector<Column>;
using CsvLine = Columns;
using CsvFile = std::vector<CsvLine>;
And the repeated double-quote escapes a double-quote semantics (as I pointed out in the comment), you should be able to use something like:
static const char colsep = ',';
start = -line % eol;
line = column % colsep;
column = quoted | *~char_(colsep);
quoted = '"' >> *("\"\"" | ~char_('"')) >> '"';
The following complete test program prints
[1997][Ford][E350][ac, abs, moon][rusty][3001.00]
(Note the BOOST_SPIRIT_DEBUG define for easy debugging). See it Live on Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
using Column = std::string;
using Columns = std::vector<Column>;
using CsvLine = Columns;
using CsvFile = std::vector<CsvLine>;
template <typename It>
struct CsvGrammar : qi::grammar<It, CsvFile(), qi::blank_type>
{
CsvGrammar() : CsvGrammar::base_type(start)
{
using namespace qi;
static const char colsep = ',';
start = -line % eol;
line = column % colsep;
column = quoted | *~char_(colsep);
quoted = '"' >> *("\"\"" | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES((start)(line)(column)(quoted));
}
private:
qi::rule<It, CsvFile(), qi::blank_type> start;
qi::rule<It, CsvLine(), qi::blank_type> line;
qi::rule<It, Column(), qi::blank_type> column;
qi::rule<It, std::string()> quoted;
};
int main()
{
const std::string s = R"(1997,Ford,E350,"ac, abs, moon","""rusty""",3001.00)";
auto f(begin(s)), l(end(s));
CsvGrammar<std::string::const_iterator> p;
CsvFile parsed;
bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
if (ok)
{
for(auto& line : parsed) {
for(auto& col : line)
std::cout << '[' << col << ']';
std::cout << std::endl;
}
} else
{
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
Sehe's post looks a fair bit cleaner than mine, but I was putting this together for a bit, so here it is anyways:
#include <boost/tokenizer.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
const std::string s = R"(1997,Ford,E350,"ac, abs, moon",""rusty"",3000.00)";
// Tokenizer
typedef boost::tokenizer< boost::escaped_list_separator<char> , std::string::const_iterator, std::string> Tokenizer;
boost::escaped_list_separator<char> seps('\\', ',', '\"');
Tokenizer tok(s, seps);
for (auto i : tok)
std::cout << i << "\n";
std::cout << "\n";
// Boost Spirit Qi
qi::rule<std::string::const_iterator, std::string()> quoted_string = '"' >> *(qi::char_ - '"') >> '"';
qi::rule<std::string::const_iterator, std::string()> valid_characters = qi::char_ - '"' - ',';
qi::rule<std::string::const_iterator, std::string()> item = *(quoted_string | valid_characters );
qi::rule<std::string::const_iterator, std::vector<std::string>()> csv_parser = item % ',';
std::string::const_iterator s_begin = s.begin();
std::string::const_iterator s_end = s.end();
std::vector<std::string> result;
bool r = boost::spirit::qi::parse(s_begin, s_end, csv_parser, result);
assert(r == true);
assert(s_begin == s_end);
for (auto i : result)
std::cout << i << std::endl;
std::cout << "\n";
}
And this outputs:
1997
Ford
E350
ac, abs, moon
rusty
3000.00
1997
Ford
E350
ac, abs, moon
rusty
3000.00
Something Worth Noting: This doesn't implement a full CSV parser. You'd also want to look into escape characters or whatever else is required for your implementation.
Also: If you're looking into the documentation, just so you know, in Qi, 'a'
is equivalent to boost::spirit::qi::lit('a')
and "abc"
is equivalent to boost::spirit::qi::lit("abc")
.
On Double quotes: So, as Sehe notes in a comment above, it's not directly clear what the rules surrounding a ""
in the input text means. If you wanted all instances of ""
not within a quoted string to be converted to a "
, then something like the following would work.
qi::rule<std::string::const_iterator, std::string()> double_quote_char = "\"\"" >> qi::attr('"');
qi::rule<std::string::const_iterator, std::string()> item = *(double_quote_char | quoted_string | valid_characters );