I have another problem with my boost::spirit parser.
template struct expression: qi::grammar { expression() : expression::base_type(expr) { number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')]; binop = (expr >> '+' >> expr)[_val = construct<:binary_op>>(_1,_2)] | (expr >> '-' >> expr)[_val = construct<:binary_op>>(_1,_2)] | (expr >> '*' >> expr)[_val = construct<:binary_op>>(_1,_2)] | (expr >> '/' >> expr)[_val = construct<:binary_op>>(_1,_2)] ; expr %= number | varname | binop; } qi::rule expr; qi::rule binop; qi::rule varname; qi::rule number; };
This was my parser. It parsed "3.1415"
and "var"
just fine, but when I tried to parse "1+2"
it tells me parse failed
. I've then tried to change the binop
rule to
binop = expr >> (('+' >> expr)[_val = construct<:binary_op>>(_1, _2)] | ('-' >> expr)[_val = construct<:binary_op>>(_1, _2)] | ('*' >> expr)[_val = construct<:binary_op>>(_1, _2)] | ('/' >> expr)[_val = construct<:binary_op>>(_1, _2)]);
But now it's of course not able to build the AST, because _1
and _2
are set differently. I have only seen something like _r1
mentioned, but as a boost-Newbie I am not quite able to understand how boost::phoenix
and boost::spirit
interact.
How to solve this?
It isn't entirely clear to me what you are trying to achieve. Most importantly, are you not worried about operator associativity? I'll just show simple answers based on using right-recursion - this leads to left-associative operators being parsed.
The straight answer to your visible question would be to juggle a fusion::vector2
- which isn't really any fun, especially in Phoenix lambda semantic actions. (I'll show below, what that looks like).
Meanwhile I think you should read up on the Spirit docs
- here in the old Spirit docs (eliminating left recursion); Though the syntax no longer applies, Spirit still generates LL recursive descent parsers, so the concept behind left-recursion still applies. The code below shows this applied to Spirit Qi
- here: the Qi examples contain three
calculator
samples, which should give you a hint on why operator associativity matters, and how you would express a grammar that captures the associativity of binary operators. Obviously, it also shows how to support parenthesized expressions to override the default evaluation order.
Code:
I have three version of code that works, parsing input like:
std::string input("1/2+3-4*5");
into an ast::expression
grouped like (using BOOST_SPIRIT_DEBUG):
.... [[1, [2, [3, [4, 5]]]]]
The links to the code are here:
First thing, I'd get rid of the alternative parse expressions per operator; this leads to excessive backtracking1. Also, as you've found out, it makes the grammar hard to maintain. So, here is a simpler variation that uses a function for the semantic action:
1check that using BOOST_SPIRIT_DEBUG!
static ast::expression make_binop(char discriminant, const ast::expression& left, const ast::expression& right) { switch(discriminant) { case '+': return ast::binary_op<:add>(left, right); case '-': return ast::binary_op<:sub>(left, right); case '/': return ast::binary_op<:div>(left, right); case '*': return ast::binary_op<:mul>(left, right); } throw std::runtime_error("unreachable in make_binop"); } // rules: number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')]; simple = varname | number; binop = (simple >> char_("-+*/") >> expr) [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ]; expr = binop | simple;
As you can see, this has the potential to reduce complexity. It is only a small step now, to remove the binop intermediate (which has become quite redundant):
number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')]; simple = varname | number; expr = simple [ _val = _1 ] > *(char_("-+*/") > expr) [ _val = phx::bind(make_binop, qi::_1, _val, qi::_2) ] > eoi;
As you can see,
- within the
expr
rule, the _val
lazy placeholder is used as a pseudo-local variable that accumulates the binops. Across rules, you'd have to use qi::locals<:expression>
for such an approach. (This was your question regarding _r1
). - there are now explicit expectation points, making the grammar more robust
- the
expr
rule no longer needs to be an auto-rule (expr =
instead of expr %=
)
Finally, for fun and gory, let me show how you could have handled your suggested code, along with the shifting bindings of _1, _2 etc.:
static ast::expression make_binop( const ast::expression& left, const boost::fusion::vector2& op_right) { switch(boost::fusion::get(op_right)) { case '+': return ast::binary_op<:add>(left, boost::fusion::get(op_right)); case '-': return ast::binary_op<:sub>(left, boost::fusion::get(op_right)); case '/': return ast::binary_op<:div>(left, boost::fusion::get(op_right)); case '*': return ast::binary_op<:mul>(left, boost::fusion::get(op_right)); } throw std::runtime_error("unreachable in make_op"); } // rules: expression::base_type(expr) { number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')]; simple = varname | number; binop %= (simple >> (char_("-+*/") > expr)) [ _val = phx::bind(make_binop, qi::_1, qi::_2) ]; // note _2!!! expr %= binop | simple;
As you can see, not nearly as much fun writing the make_binop
function that way!