I want to read a CSV into a struct :
struct data
{
std::string a;
std::string b;
std::string c;
}
However, I want to read even empty
You just want to make sure you parse a value for "empty" strings too.
value = +(char_ - ',' - eol) | attr("(unspecified)");
entry = value >> ',' >> value >> ',' >> value >> eol;
See the demo:
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct data {
std::string a;
std::string b;
std::string c;
};
BOOST_FUSION_ADAPT_STRUCT(data, (std::string, a)(std::string, b)(std::string, c))
template <typename Iterator, typename skipper = qi::blank_type>
struct google_parser : qi::grammar<Iterator, data(), skipper> {
google_parser() : google_parser::base_type(entry, "contacts") {
using namespace qi;
value = +(char_ - ',' - eol) | attr("(unspecified)");
entry = value >> ',' >> value >> ',' >> value >> eol;
BOOST_SPIRIT_DEBUG_NODES((value)(entry))
}
private:
qi::rule<Iterator, std::string()> value;
qi::rule<Iterator, data(), skipper> entry;
};
int main() {
using It = std::string::const_iterator;
google_parser<It> p;
for (std::string input : {
"something, awful, is\n",
"fine,,just\n",
"like something missing: ,,\n",
})
{
It f = input.begin(), l = input.end();
data parsed;
bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
if (ok)
std::cout << "Parsed: '" << parsed.a << "', '" << parsed.b << "', '" << parsed.c << "'\n";
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Prints:
Parsed: 'something', 'awful', 'is'
Parsed: 'fine', '(unspecified)', 'just'
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'
However, you have a bigger problem. The assumption that qi::repeat(2) [ value ]
will parse into 2 strings doesn't work.
repeat
, like operator*
, operator+
and operator%
parse into a container attribute. In this case the container attribute (string) will receive the input from the second value
as well:
Live On Coliru
Parsed: 'somethingawful', 'is', ''
Parsed: 'fine(unspecified)', 'just', ''
Parsed: 'like something missing: (unspecified)', '(unspecified)', ''
Since this is not what you want, reconsider your data types:
either don't adapt the struct but instead write a customization trait to assign the fields (see http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/advanced/customize.html)
change the struct to contain a vector of std::string to match the exposed attributes
or create an auto-parser generator:
auto_
approach:If you teach Qi how to extract a single value, you can use a simple rule like
entry = skip(skipper() | ',') [auto_] >> eol;
This way, Spirit itself will generate the correct number of value extractions for the given Fusion sequence!
Here's a quick an dirty approach:
CAVEAT Specializing for
std::string
directly like this might not be the best idea (it might not always be appropriate and might interact badly with other parsers). However, by defaultcreate_parser<std::string>
is not defined (because, what would it do?) so I seized the opportunity for the purpose of this demonstration:
namespace boost { namespace spirit { namespace traits {
template <> struct create_parser<std::string> {
typedef proto::result_of::deep_copy<
BOOST_TYPEOF(
qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
)
>::type type;
static type call() {
return proto::deep_copy(
qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
);
}
};
}}}
Again, see the demo output:
Live On Coliru
Parsed: 'something', 'awful', 'is'
Parsed: 'fine', 'just', '(unspecified)'
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'
NOTE There was some advanced sorcery to get the skipper to work "just right" (see
skip()[]
andlexeme[]
). Some general explanations can be found here: Boost spirit skipper issues
There's a subtlety to that. Two actually. So here's a demo:
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct data {
std::vector<std::string> parts;
};
BOOST_FUSION_ADAPT_STRUCT(data, (std::vector<std::string>, parts))
template <typename Iterator, typename skipper = qi::blank_type>
struct google_parser : qi::grammar<Iterator, data(), skipper> {
google_parser() : google_parser::base_type(entry, "contacts") {
using namespace qi;
qi::as<std::vector<std::string> > strings;
value = +(char_ - ',' - eol) | attr("(unspecified)");
entry = strings [ repeat(2) [ value >> ',' ] >> value ] >> eol;
BOOST_SPIRIT_DEBUG_NODES((value)(entry))
}
private:
qi::rule<Iterator, std::string()> value;
qi::rule<Iterator, data(), skipper> entry;
};
int main() {
using It = std::string::const_iterator;
google_parser<It> p;
for (std::string input : {
"something, awful, is\n",
"fine,,just\n",
"like something missing: ,,\n",
})
{
It f = input.begin(), l = input.end();
data parsed;
bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
if (ok) {
std::cout << "Parsed: ";
for (auto& part : parsed.parts)
std::cout << "'" << part << "' ";
std::cout << "\n";
}
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
The subtleties are:
repeat[...]>>value
as synthesizing a single container /atomically/. The as<T> directive solves that here