Boost::Spirit Expression Parser

匿名 (未验证) 提交于 2019-12-03 08:59:04

问题:

I have another problem with my boost::spirit parser.

template struct expression: qi::grammar {     expression() :         expression::base_type(expr) {         number %= lexeme[double_];         varname %= lexeme[alpha >> *(alnum | '_')];          binop = (expr >> '+' >> expr)[_val = construct<:binary_op>>(_1,_2)]               | (expr >> '-' >> expr)[_val = construct<:binary_op>>(_1,_2)]               | (expr >> '*' >> expr)[_val = construct<:binary_op>>(_1,_2)]               | (expr >> '/' >> expr)[_val = construct<:binary_op>>(_1,_2)] ;          expr %= number | varname | binop;     }      qi::rule expr;     qi::rule binop;     qi::rule varname;     qi::rule number; }; 

This was my parser. It parsed "3.1415" and "var" just fine, but when I tried to parse "1+2" it tells me parse failed. I've then tried to change the binop rule to

    binop = expr >>            (('+' >> expr)[_val = construct<:binary_op>>(_1, _2)]           | ('-' >> expr)[_val = construct<:binary_op>>(_1, _2)]           | ('*' >> expr)[_val = construct<:binary_op>>(_1, _2)]           | ('/' >> expr)[_val = construct<:binary_op>>(_1, _2)]); 

But now it's of course not able to build the AST, because _1 and _2 are set differently. I have only seen something like _r1 mentioned, but as a boost-Newbie I am not quite able to understand how boost::phoenix and boost::spirit interact.

How to solve this?

回答1:

It isn't entirely clear to me what you are trying to achieve. Most importantly, are you not worried about operator associativity? I'll just show simple answers based on using right-recursion - this leads to left-associative operators being parsed.

The straight answer to your visible question would be to juggle a fusion::vector2 - which isn't really any fun, especially in Phoenix lambda semantic actions. (I'll show below, what that looks like).

Meanwhile I think you should read up on the Spirit docs

  • here in the old Spirit docs (eliminating left recursion); Though the syntax no longer applies, Spirit still generates LL recursive descent parsers, so the concept behind left-recursion still applies. The code below shows this applied to Spirit Qi
  • here: the Qi examples contain three calculator samples, which should give you a hint on why operator associativity matters, and how you would express a grammar that captures the associativity of binary operators. Obviously, it also shows how to support parenthesized expressions to override the default evaluation order.

Code:

I have three version of code that works, parsing input like:

std::string input("1/2+3-4*5"); 

into an ast::expression grouped like (using BOOST_SPIRIT_DEBUG):

   ....   [[1, [2, [3, [4, 5]]]]]

The links to the code are here:

Step 1: Reduce semantic actions

First thing, I'd get rid of the alternative parse expressions per operator; this leads to excessive backtracking1. Also, as you've found out, it makes the grammar hard to maintain. So, here is a simpler variation that uses a function for the semantic action:

1check that using BOOST_SPIRIT_DEBUG!

static ast::expression make_binop(char discriminant,       const ast::expression& left, const ast::expression& right) {     switch(discriminant)     {         case '+': return ast::binary_op<:add>(left, right);         case '-': return ast::binary_op<:sub>(left, right);         case '/': return ast::binary_op<:div>(left, right);         case '*': return ast::binary_op<:mul>(left, right);     }     throw std::runtime_error("unreachable in make_binop"); }  // rules: number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')];  simple = varname | number; binop = (simple >> char_("-+*/") >> expr)      [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ];   expr = binop | simple; 

Step 2: Remove redundant rules, use _val

As you can see, this has the potential to reduce complexity. It is only a small step now, to remove the binop intermediate (which has become quite redundant):

number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')];  simple = varname | number; expr = simple [ _val = _1 ]      > *(char_("-+*/") > expr)              [ _val = phx::bind(make_binop, qi::_1, _val, qi::_2) ]     > eoi; 

As you can see,

  • within the expr rule, the _val lazy placeholder is used as a pseudo-local variable that accumulates the binops. Across rules, you'd have to use qi::locals<:expression> for such an approach. (This was your question regarding _r1).
  • there are now explicit expectation points, making the grammar more robust
  • the expr rule no longer needs to be an auto-rule (expr = instead of expr %=)

Step 0: Wrestle fusion types directly

Finally, for fun and gory, let me show how you could have handled your suggested code, along with the shifting bindings of _1, _2 etc.:

static ast::expression make_binop(         const ast::expression& left,          const boost::fusion::vector2& op_right) {     switch(boost::fusion::get(op_right))     {         case '+': return ast::binary_op<:add>(left, boost::fusion::get(op_right));         case '-': return ast::binary_op<:sub>(left, boost::fusion::get(op_right));         case '/': return ast::binary_op<:div>(left, boost::fusion::get(op_right));         case '*': return ast::binary_op<:mul>(left, boost::fusion::get(op_right));     }     throw std::runtime_error("unreachable in make_op"); }  // rules: expression::base_type(expr) { number %= lexeme[double_]; varname %= lexeme[alpha >> *(alnum | '_')];  simple = varname | number; binop %= (simple >> (char_("-+*/") > expr))      [ _val = phx::bind(make_binop, qi::_1, qi::_2) ]; // note _2!!!  expr %= binop | simple; 

As you can see, not nearly as much fun writing the make_binop function that way!



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!