I seem to be experiencing some mental block with Boost Spirit I just cannot get by. I have a fairly simple grammar I need to handle, where I would like to put the values int
Notes:
with this
add_ = ( ( "add" >> attr( ' ' ) ) [ _val = "add" ] );
modify_ = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
clear_ = ( ( "clear" >> attr( ' ' ) ) [ _val = "clear" ] );
did you mean to require a space? Or are you really just trying to force the struct action
field to contain a trailing space (that's what will happen).
If you meant the latter, I'd do that outside of the parser¹.
If you wanted the first, use the kw
facility:
add_ = kw["add"] [ _val = "add" ];
modify_ = kw["modify"] [ _val = "modify" ];
clear_ = kw["clear"] [ _val = "clear" ];
In fact, you can simplify that (again, ¹):
add_ = raw[ kw["add"] ];
modify_ = raw[ kw["modify"] ];
clear_ = raw[ kw["clear"] ];
Which also means that you can simplify to
action_ = raw[ kw[lit("add")|"modify"|"clear"] ];
However, getting a bit close to your question, you could also use a symbol parser:
symbols<char> action_sym;
action_sym += "add", "modify", "clear";
//
action_ = raw[ kw[action_sym] ];
Caveat: the symbols needs to be a member so its lifetime extends beyond the constructor.
If you meant to capture the input representation of ipv4 addresses with
ipv4 = +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
Side note I'm assuming
+as_string
is a simple mistake and you meantas_string
instead.
Simplify:
qi::uint_parser<uint8_t, 10, 1, 3> octet;
This obviates the range checks (see ¹ again):
ipv4 = as_string[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
However, this would build a 4-char binary string representation of the address. If you wanted that, fine. I doubt it (because you'd have written std::array<uint8_t, 4>
or uint64_t
, right?). So if you wanted the string, again use raw[]
:
ipv4 = raw[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
Same issue as with number 1.:
pair = identity >> -( attr(' ') >> value );
This time, the problem betrays that the productions should not be in token
; Conceptually token
-izing precedes parsing and hence I'd keep the tokens skipper-less. kw
doesn't really do a lot of good in that context. Instead, I'd move pair
, map
and list
(unused?) into the parser:
pair = kw[identity] >> -value;
map = +pair;
list = *value;
There's a very recent example I made about using symbols
to parse (here), but this answer comes a lot closer to your question:
It goes far beyond the scope of your parser because it does all kinds of actions in the grammar, but what it does show is to have generic "lookup-ish" rules that can be parameterized with a particular "symbol set": see the Identifier Lookup section of the answer:
Identifier Lookup
We store "symbol tables" in
Domain
members_variables
and_functions
:using Domain = qi::symbols<char>; Domain _variables, _functions;
Then we declare some rules that can do lookups on either of them:
// domain identifier lookups qi::_r1_type _domain; qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known,
unknown;
The corresponding declarations will be shown shortly.
Variables are pretty simple:
variable = maybe_known(phx::ref(_variables));
Calls are trickier. If a name is unknown we don't want to assume it implies a function unless it's followed by a
'('
character. However, if an identifier is a known function name, we want even to imply the(
(this gives the UX the appearance of autocompletion where when the user typessqrt
, it suggests the next character to be(
magically).// The heuristics: // - an unknown identifier followed by ( // - an unclosed argument list implies ) call %= (
known(phx::ref(_functions)) // known -> imply the parens | &(identifier >> '(') >> unknown(phx::ref(_functions)) ) >> implied('(') >> -(expression % ',') >> implied(')');
It all builds on
known
,unknown
andmaybe_known
://///////////////////////////// // identifier loopkup, suggesting { maybe_known = known(_domain) | unknown(_domain); // distinct to avoid partially-matching identifiers using boost::spirit::repository::qi::distinct; auto kw = distinct(copy(alnum | '_')); known = raw[kw[lazy(_domain)]]; unknown = raw[identifier[_val=_1]] [suggest_for(_1, _domain)]; }
I think you can use the same approach constructively here. One additional gimmick could be to validate that properties supplied are, in fact, unique.
Combining all the hints above makes it compile and "parse" the test commands:
Live On Coliru
#include <string>
#include <map>
#include <vector>
namespace ast {
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct command {
string host;
string action;
map option;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace grammar
{
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct parser
{
struct skip : qi::grammar<It> {
skip() : skip::base_type(text) {
using namespace qi;
// handle all whitespace along with line/block comments
text = ascii::space
| (lit("#")|"--"|"//") >> *(char_ - eol) >> (eoi | eol) // line comment
| "/*" >> *(char_ - "*/") >> "*/"; // block comment
//
BOOST_SPIRIT_DEBUG_NODES((text))
}
private:
qi::rule<It> text;
};
//
struct token {
//
token() {
using namespace qi;
// common
string = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
value = string | identity;
// ip target
any = '*';
local = '.' | fqdn;
fqdn = +char_("a-zA-Z0-9.\\-"); // concession
ipv4 = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
//
target = any | local | fqdn | ipv4;
//
BOOST_SPIRIT_DEBUG_NODES(
(string) (identity) (value)
(any) (local) (fqdn) (ipv4) (target)
)
}
protected:
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::uint_parser<uint8_t, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
};
//
struct test : token, qi::grammar<It, ast::command(), skip> {
//
test() : test::base_type(command_)
{
using namespace qi;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
//
action_sym += "add", "modify", "clear";
action_ = raw[ kw[action_sym] ];
//
command_ = kw["test"]
>> target
>> action_
>> '(' >> map >> ')'
>> ';';
//
pair = kw[identity] >> -value;
map = +pair;
list = *value;
BOOST_SPIRIT_DEBUG_NODES(
(command_) (action_)
(pair) (map) (list)
)
}
private:
using token::target;
using token::identity;
using token::value;
qi::symbols<char> action_sym;
//
qi::rule<It, ast::command(), skip> command_;
qi::rule<It, std::string(), skip> action_;
//
qi::rule<It, ast::map(), skip> map;
qi::rule<It, ast::pair(), skip> pair;
qi::rule<It, ast::list(), skip> list;
};
};
}
#include <fstream>
int main() {
using It = boost::spirit::istream_iterator;
using Parser = grammar::parser<It>;
std::ifstream input("input.txt");
It f(input >> std::noskipws), l;
Parser::skip const s{};
Parser::test const p{};
std::vector<ast::command> data;
bool ok = phrase_parse(f, l, *p, s, data);
if (ok) {
std::cout << "Parsed " << data.size() << " commands\n";
} else {
std::cout << "Parsed failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
Prints
Parsed 3 commands
Like in the linked answer above, let's pass the map
, pair
rules the actual key set to get their allowed values from:
using KeySet = qi::symbols<char>;
using KeyRef = KeySet const*;
//
KeySet add_keys, modify_keys, clear_keys;
qi::symbols<char, KeyRef> action_sym;
qi::rule<It, ast::pair(KeyRef), skip> pair;
qi::rule<It, ast::map(KeyRef), skip> map;
Note A key feature used is the associated attribute value with a
symbols<>
lookup (in this case we associate aKeyRef
with an action symbol):
//
add_keys += "a1", "a2", "a3", "a4", "a5", "a6";
modify_keys += "m1", "m2", "m3", "m4";
clear_keys += "c1", "c2", "c3", "c4", "c5";
action_sym.add
("add", &add_keys)
("modify", &modify_keys)
("clear", &clear_keys);
Now the heavy lifting starts.
qi::locals<>
and inherited attributesLet's give command_
some local space to store the selected keyset:
qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
Now we can in principle assignt to it (using the _a
placeholder). However, there's some details:
//
qi::_a_type selected;
Always prefer descriptive names :) _a
and _r1
get old pretty quick. Things are confusing enough as it is.
command_ %= kw["test"]
>> target
>> raw[ kw[action_sym] [ selected = _1 ] ]
>> '(' >> map(selected) >> ')'
>> ';';
Note: the subtlest detail here is
%=
instead of=
to avoid the suppression of automatic attribute propagation when a semantic action is present (yeah, see ¹ again...)
But all in all, that doesn't read so bad?
//
qi::_r1_type symref;
pair = raw[ kw[lazy(*symref)] ] >> -value;
map = +pair(symref);
And now at least things parse
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <string>
#include <map>
#include <vector>
namespace ast {
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct command {
string host;
string action;
map option;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace grammar
{
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct parser
{
struct skip : qi::grammar<It> {
skip() : skip::base_type(rule_) {
using namespace qi;
// handle all whitespace along with line/block comments
rule_ = ascii::space
| (lit("#")|"--"|"//") >> *(char_ - eol) >> (eoi | eol) // line comment
| "/*" >> *(char_ - "*/") >> "*/"; // block comment
//
//BOOST_SPIRIT_DEBUG_NODES((skipper))
}
private:
qi::rule<It> rule_;
};
//
struct token {
//
token() {
using namespace qi;
// common
string = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
value = string | identity;
// ip target
any = '*';
local = '.' | fqdn;
fqdn = +char_("a-zA-Z0-9.\\-"); // concession
ipv4 = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
//
target = any | local | fqdn | ipv4;
//
BOOST_SPIRIT_DEBUG_NODES(
(string) (identity) (value)
(any) (local) (fqdn) (ipv4) (target)
)
}
protected:
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::uint_parser<uint8_t, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
};
//
struct test : token, qi::grammar<It, ast::command(), skip> {
//
test() : test::base_type(start_)
{
using namespace qi;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
//
add_keys += "a1", "a2", "a3", "a4", "a5", "a6";
modify_keys += "m1", "m2", "m3", "m4";
clear_keys += "c1", "c2", "c3", "c4", "c5";
action_sym.add
("add", &add_keys)
("modify", &modify_keys)
("clear", &clear_keys);
//
qi::_a_type selected;
command_ %= kw["test"]
>> target
>> raw[ kw[action_sym] [ selected = _1 ] ]
>> '(' >> map(selected) >> ')'
>> ';';
//
qi::_r1_type symref;
pair = raw[ kw[lazy(*symref)] ] >> -value;
map = +pair(symref);
list = *value;
start_ = command_;
BOOST_SPIRIT_DEBUG_NODES(
(start_) (command_)
(pair) (map) (list)
)
}
private:
using token::target;
using token::identity;
using token::value;
using KeySet = qi::symbols<char>;
using KeyRef = KeySet const*;
//
qi::rule<It, ast::command(), skip> start_;
qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
//
KeySet add_keys, modify_keys, clear_keys;
qi::symbols<char, KeyRef> action_sym;
qi::rule<It, ast::pair(KeyRef), skip> pair;
qi::rule<It, ast::map(KeyRef), skip> map;
qi::rule<It, ast::list(), skip> list;
};
};
}
#include <fstream>
int main() {
using It = boost::spirit::istream_iterator;
using Parser = grammar::parser<It>;
std::ifstream input("input.txt");
It f(input >> std::noskipws), l;
Parser::skip const s{};
Parser::test const p{};
std::vector<ast::command> data;
bool ok = phrase_parse(f, l, *p, s, data);
if (ok) {
std::cout << "Parsed " << data.size() << " commands\n";
} else {
std::cout << "Parsed failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
Prints
Parsed 3 commands
Yeah. If you enable debug, you'll see it parses things oddly:
<attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], [c, 2]], [[c, 3], []]]]]</attributes>
This is actually "merely" a problem with the grammar. If the grammar cannot see the difference between a key
and value
then obviously c2
is going to be parsed as the value of property with key c1
.
It's up to you to disambiguate the grammar. For now, I'm going to demonstrate a fix using a negative assertion: we only accept values that are not known keys. It's a bit dirty, but might be useful to you for instructional purposes:
key = raw[ kw[lazy(*symref)] ];
pair = key(symref) >> -(!key(symref) >> value);
map = +pair(symref);
Note I factored out the key
rule for readability:
Live On Coliru
Parses
<attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], []], [[c, 2], []], [[c, 3], []]]]]</attributes>
Just what the doctor ordered!
¹ Boost Spirit: "Semantic actions are evil"?