Spirit qi parsing to an Abstract Syntax Tree for nested functions

别等时光非礼了梦想. 提交于 2019-12-23 22:07:11

问题


I am trying to create a parser using boost's spirit qi parser. It is parsing a string that contains three types of values. A constant, a variable, or a function. The functions can be nested inside of each other. The test string is f(a, b) = f(g(z, x), g(x, h(x)), c), where a-e are constants, f-r are functions, and s-z are variables. I successfully created a rule that can correctly parse the expression. The trouble arose when I changed the function parsing the rule into a grammar. There were several errors that I was able to fix. I almost got the grammar to parse the expression and turn it into an abstract syntax tree I created. However I got this error about a file contained in the boost library and I could not figure out where it is coming from because I don't understand the compiler message. I was following the example put up on the website for putting data from a parser to a struct using the employee example: http://www.boost.org/doc/libs/1_41_0/libs/spirit/example/qi/employee.cpp

main.cpp

#include "Parser.h"
#include "Term.h"

#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
#include <list>

using std::string;
using std::cout;
using std::endl;

int main()
{
    cout << "Unification Algorithm" << endl << endl;

    string phrase = "f(a, b) = f(g(z, x), g(x, h(x)), c)";
    string::const_iterator itr = phrase.begin();
    string::const_iterator last = phrase.end();

    cout << phrase << endl;

    // Parser grammar
    Parser<string::const_iterator> g;

    // Output data
    Expression expression;

    if (phrase_parse(itr, last, g, boost::spirit::ascii::space, expression))
    {
        cout << "Expression parsed." << endl;
    }
    else
    {
        cout << "Could not parse expression." << endl;
    }
}

Parser.h

#ifndef _Parser_h_
#define _Parser_h_

#include "Term.h"

#include <boost/spirit/include/qi.hpp>
#include <vector>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

template <typename Iterator>
struct Parser : qi::grammar<Iterator, Expression(), ascii::space_type>
{
    Parser() : Parser::base_type(expression)
    {
        using qi::char_;

        const_char = char_("a-eA-E");
        fn_char = char_("f-rF-R");
        var_char = char_("s-zS-Z");

        basic_fn = fn_char >> char_('(') >> (const_char | var_char) % char_(',') >> char_(')');
        first_fn_wrapper = fn_char >> char_('(') >> (basic_fn | const_char | var_char) % char_(',') >> char_(')');
        nested_fn = fn_char >> char_('(') >> (first_fn_wrapper | const_char | var_char) % char_(',') >> char_(')');

        expression = nested_fn >> char_("=") >> nested_fn;
    }
    // Constant character a - e
    qi::rule<Iterator, T_Cons, ascii::space_type> const_char;
    // Function character f - r
    qi::rule<Iterator, char(), ascii::space_type> fn_char;
    // Variable character s - z
    qi::rule<Iterator, T_Var, ascii::space_type> var_char;
    // Allows for basic function parsing eg. f(x, y, z)
    qi::rule<Iterator, T_Fn, ascii::space_type> basic_fn;
    // Allows for single nested functions eg. f(g(x), y, z)
    qi::rule<Iterator, T_Fn, ascii::space_type> first_fn_wrapper;
    // Allows for fully nested functions eg. f(g(x, h(y)), z) and so on
    qi::rule<Iterator, T_Fn, ascii::space_type> nested_fn;
    // Full rule for a nested function expression
    qi::rule<Iterator, Expression, ascii::space_type> expression;
};
#endif // _Parser_h_

Term.h

#ifndef _Term_h_
#define _Term_h_

#include <boost/fusion/include/adapt_struct.hpp>
#include <vector>

struct Term
{
    char name;
};

BOOST_FUSION_ADAPT_STRUCT(Term, (char, name))

struct T_Cons : Term
{

};

BOOST_FUSION_ADAPT_STRUCT(T_Cons, (char, name))

struct T_Var : Term
{

};

BOOST_FUSION_ADAPT_STRUCT(T_Var, (char, name))

struct T_Fn : Term
{
    std::vector<Term> * params;

    T_Fn() { params = new std::vector<Term>(); }

    ~T_Fn() { delete params; }
};

BOOST_FUSION_ADAPT_STRUCT(T_Fn, (std::vector<Term>*, params))

struct Expression
{
    Term lh_term;
    Term rh_term;
};

 BOOST_FUSION_ADAPT_STRUCT(Expression, (char, name) (Term, lh_term) (Term, rh_term))

#endif // _Term_h_

I cannot link the entire error message from the compiler because it is extremely long, but here are the last few. These are the compile errors that it gave:

boost_1_46_0\boost\mpl\assert.hpp|360|error: no matching function for call to 'assertion_failed(mpl_::failed************ (boost::spirit::qi::grammar<Iterator, T1, T2, T3, T4>::grammar(const boost::spirit::qi::rule<Iterator_, T1_, T2_, T3_, T4_>&, const string&) [with Iterator_ = __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >; T1_ = Expression; T2_ = boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::space, boost::spirit::char_encoding::asci|
boost_1_46_0\boost\proto\extends.hpp|540|error: use of deleted function 'boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::qi::reference<const boost::spirit::qi::rule<__gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >, Expression(), boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::space, boost::spirit::char_encoding::ascii> >, 0l>, boost::spirit::unused_type, boost::spirit::unused_type> > >, 0l>:|
boost_1_46_0\boost\proto\detail\expr0.hpp|165|error: no matching function for call to 'boost::spirit::qi::reference<const boost::spirit::qi::rule<__gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >, Expression(), boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::space, boost::spirit::char_encoding::ascii> >, 0l>, boost::spirit::unused_type, boost::spirit::unused_type> >::reference()'|

回答1:


UPDATE Showing a simplified parser with a a recursive ast parsing the sample expression shown

As always, the assertion message leads to exactly the problem:

        // If you see the assertion below failing then the start rule
        // passed to the constructor of the grammar is not compatible with
        // the grammar (i.e. it uses different template parameters).
        BOOST_SPIRIT_ASSERT_MSG(
            (is_same<start_type, rule<Iterator_, T1_, T2_, T3_, T4_> >::value)
          , incompatible_start_rule, (rule<Iterator_, T1_, T2_, T3_, T4_>));

So it tells you you should match the grammar with the start rule: you have

struct Parser : qi::grammar<Iterator, Expression(), ascii::space_type>

but

qi::rule<Iterator, Expression, ascii::space_type> expression;

Clearly you forgot parentheses there:

qi::rule<Iterator, Expression(), ascii::space_type> expression;

Guidelines when using generic libraries:

Some of these "rules" are generically applicable, with the exception of no. 2 which is specifically related to Boost Spirit:

  1. baby steps; start small (empty, even)
  2. start with the AST to match the grammar exactly
  3. build gradually,
  4. compiling every step along the way

UPDATE

Here's a much simplified grammar. As mentioned, in the "first rules of spirit" just before, start with the AST to match the grammar exactly:

namespace ast {

    namespace tag {
        struct constant;
        struct variable;
        struct function;
    }

    template <typename Tag> struct Identifier { char name; };

    using Constant = Identifier<tag::constant>;
    using Variable = Identifier<tag::variable>;
    using Function = Identifier<tag::function>;

    struct FunctionCall;

    using Expression = boost::make_recursive_variant<
        Constant,
        Variable,
        boost::recursive_wrapper<FunctionCall>
    >::type;

    struct FunctionCall {
        Function function;
        std::vector<Expression> params;
    };

    struct Equation {
        Expression lhs, rhs;
    };
}

Of course this could be much simpler still since all identifiers are just char and you could do the switching dynamically (impression).

Now, the grammar will have to follow. 1. Keep it simple 2. Format carefully 3. Match the ast directly, 4. add debug macros:

template <typename It, typename Skipper = ascii::space_type>
    struct Parser : qi::grammar<It, ast::Equation(), Skipper>
{
    Parser() : Parser::base_type(equation_)
    {
        using namespace qi;

        constant_ = qi::eps >> char_("a-eA-E");
        function_ = qi::eps >> char_("f-rF-R");
        variable_ = qi::eps >> char_("s-zS-Z");

        function_call = function_ >> '(' >> -(expression_ % ',') >> ')';
        expression_   = constant_ | variable_ | function_call;
        equation_     = expression_ >> '=' >> expression_;

        BOOST_SPIRIT_DEBUG_NODES((constant_)(function_)(variable_)(function_call)(expression_)(equation_))
    }
    qi::rule<It, ast::Constant()> constant_;
    qi::rule<It, ast::Function()> function_;
    qi::rule<It, ast::Variable()> variable_;

    qi::rule<It, ast::FunctionCall(), Skipper> function_call;
    qi::rule<It, ast::Expression(),   Skipper> expression_;
    qi::rule<It, ast::Equation(),     Skipper> equation_;
};

Note how the comments have become completely unneeded. Also note how recursively using expression_ solved your biggest headache!

Full Program

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi    = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

namespace ast {

    namespace tag {
        struct constant;
        struct variable;
        struct function;
    }

    template <typename Tag> struct Identifier { char name; };

    using Constant = Identifier<tag::constant>;
    using Variable = Identifier<tag::variable>;
    using Function = Identifier<tag::function>;

    struct FunctionCall;

    using Expression = boost::make_recursive_variant<
        Constant,
        Variable,
        boost::recursive_wrapper<FunctionCall>
    >::type;

    struct FunctionCall {
        Function function;
        std::vector<Expression> params;
    };

    struct Equation {
        Expression lhs, rhs;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::Constant, (char, name))
BOOST_FUSION_ADAPT_STRUCT(ast::Variable, (char, name))
BOOST_FUSION_ADAPT_STRUCT(ast::Function, (char, name))
BOOST_FUSION_ADAPT_STRUCT(ast::FunctionCall, (ast::Function, function)(std::vector<ast::Expression>, params))
BOOST_FUSION_ADAPT_STRUCT(ast::Equation, (ast::Expression, lhs)(ast::Expression, rhs))

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

template <typename It, typename Skipper = ascii::space_type>
struct Parser : qi::grammar<It, ast::Equation(), Skipper>
{
    Parser() : Parser::base_type(equation_)
    {
        using namespace qi;

        constant_ = qi::eps >> char_("a-eA-E");
        function_ = qi::eps >> char_("f-rF-R");
        variable_ = qi::eps >> char_("s-zS-Z");

        function_call = function_ >> '(' >> -(expression_ % ',') >> ')';
        expression_   = constant_ | variable_ | function_call;
        equation_     = expression_ >> '=' >> expression_;

        BOOST_SPIRIT_DEBUG_NODES((constant_)(function_)(variable_)(function_call)(expression_)(equation_))
    }
    qi::rule<It, ast::Constant()> constant_;
    qi::rule<It, ast::Function()> function_;
    qi::rule<It, ast::Variable()> variable_;

    qi::rule<It, ast::FunctionCall(), Skipper> function_call;
    qi::rule<It, ast::Expression(),   Skipper> expression_;
    qi::rule<It, ast::Equation(),     Skipper> equation_;
};

int main() {
    std::cout << "Unification Algorithm\n\n";

    std::string const phrase = "f(a, b) = f(g(z, x), g(x, h(x)), c)";
    using It = std::string::const_iterator;
    It itr = phrase.begin(), last = phrase.end();

    std::cout << phrase << std::endl;

    Parser<It> g;
    ast::Equation parsed;

    if (phrase_parse(itr, last, g, ascii::space, parsed)) {
        std::cout << "Expression parsed.\n";
    } else {
        std::cout << "Could not parse equation.\n";
    }

    if (itr != last) {
        std::cout << "Remaining unparsed input: '" << std::string(itr,last) << "'\n";
    }
}



回答2:


A vanilla C++ solution (as per popular request)

I compiled it with MSVC 2013.
Lack of unrestricted unions support lead me to duplicate the 3 possible values of an argument.

There are workarounds for this limitation, but (like so many other things in C++) they are rather messy, so I kept them out to limit code obfuscation.

#include <string>
#include <vector>
#include <iostream>
using namespace std;

// possible token types
enum tTokenType {
    T_CONST,  // s-z
    T_VAR,    // a-e
    T_FUNC,   // f-r
    T_EQUAL,  // =
    T_COMMA,  // ,
    T_OBRACE, // (
    T_CBRACE, // )
    T_SPACE,  // ' ' or '\t'
    T_ERROR,  // anything but spaces
    T_EOI     // end of input
};

// tokens
struct tToken {
    tTokenType _type;  // lexical element type
    char       _value; // the actual const/var/func letter
    size_t     _index; // position in translation unit

    static const string constants, variables, functions, spacing;
    static const char * type_name[];

    tToken(tTokenType t, size_t index) : _type(t), _value(0), _index(index) {}

    static tTokenType type(char c)
    {
        if (constants.find(c) != string::npos) return T_CONST;
        if (variables.find(c) != string::npos) return T_VAR;
        if (functions.find(c) != string::npos) return T_FUNC;
        if (spacing  .find(c) != string::npos) return T_SPACE;
        if (c == '=')                          return T_EQUAL;
        if (c == ',')                          return T_COMMA;
        if (c == '(')                          return T_OBRACE;
        if (c == ')')                          return T_CBRACE;
        return T_ERROR;
    }

    tToken(char c, size_t index) : _value(c), _index(index)
    {
        _type = type(c);
    }

    void croak(tTokenType type)
    {
        string err(_index - 1, '-');
        cerr << err << "^ expecting " << type_name[(int)type] << "\n";
    }
};
const string tToken::variables("abcde");
const string tToken::functions("fghijklmnopqr");
const string tToken::constants("stuvwxyz");
const string tToken::spacing  (" \t");
const char * tToken::type_name[] = { "constant", "variable", "function", "=", ",", "(", ")", "space", "error", "end of input" };

// parser
class Parser {
    friend class Compiler;
    string _input; // remaining program input
    size_t _index; // current token index (for error tracking)

    void skip_spaces(void)
    {
        while (_input.length() != 0 && tToken::type(_input[0]) == T_SPACE) next();
    }

    void next(void)
    {
        _input.erase(0, 1);
        _index++;
    }

public:
    void read (string program)
    {
        _input = program;
        _index = 0;
        skip_spaces();
    }

    tToken get(void)
    {
        tToken res = peek();
        next();
        skip_spaces();
        return res;
    }

    tToken peek(void)
    {
        if (_input.length() == 0) return tToken(T_EOI, _index);
        return tToken (_input[0], _index);
    }

    tToken accept(tTokenType type)
    {
        tToken t = get();
        return (t._type == type) ? t : tToken (T_ERROR, _index-1);
    }

    bool consume(tTokenType type)
    {
        tToken t = get();
        bool res = t._type == type;
        if (!res) t.croak(type);
        return res;
    }
};

// syntactic elements
struct tSyntacticElement {
    char name;
    bool valid;

    tSyntacticElement() : name('?'), valid(false) {}
    tSyntacticElement(char c) : name(c), valid(false) {}
};

class tConstant : private tSyntacticElement  {
    friend class tArgument;
    tConstant() {}
    tConstant(tToken t) : tSyntacticElement(t._value) { }
};

class tVariable : private tSyntacticElement  {
    friend class tArgument;
    tVariable() {}
    tVariable(tToken t) : tSyntacticElement(t._value) { }

};

class tFunCall : private tSyntacticElement {
    friend class Compiler;
    friend class tProgram;
    friend class tArgument;
    vector<tArgument>params;

    tFunCall() {}
    tFunCall(tToken t) : tSyntacticElement(t._value) { }
    void add_argument(tArgument a);

    string dump(void);
};

class tArgument {
    friend class Compiler;
    friend class tProgram;
    friend class tFunCall;
    tTokenType type;

    // MSVC 2013 does not support unrestricted unions, so for the 
    // sake of simplicity I'll leave it as 3 separate attributes
    tConstant c; 
    tVariable v;
    tFunCall  f;

    tArgument() {}

    tArgument(tToken val) : type(val._type)
    {
        if (val._type == T_CONST) c = val;
        if (val._type == T_VAR  ) v = val;
    }
    tArgument(tFunCall  f) : type(T_FUNC ), f(f) {}

    string dump(void)
    {
        if (type == T_VAR)   return string("$") + v.name;
        if (type == T_CONST) return string("#") + c.name;
        if (type == T_FUNC)  return f.dump();
        return "!";
    }
};

class tProgram  {
    friend class Compiler;
    tArgument left;
    tArgument right;
    bool valid;

    string dump(void) { return left.dump() + " = " + right.dump(); }
};

// syntactic analyzer

void tFunCall::add_argument(tArgument a) { params.push_back(a); }

string tFunCall::dump(void)
{
    string res(1, name);
    res += '(';
    // it's 2015 and still no implode() in C++...
    for (size_t i = 0; i != params.size(); i++)
    {
        res += params[i].dump();
        if (i != params.size() - 1) res += ',';
    }
    res += ')';
    return res;
}

class Compiler {
    Parser parser;
    tProgram program;

    tFunCall parse_function(void)
    {
        tToken f = parser.accept(T_FUNC);
        tFunCall res (f);
        parser.accept(T_OBRACE);
        for (;;)
        {
            tArgument a = parse_argument();
            res.add_argument(a);
            tToken next = parser.get();
            if (next._type == T_CBRACE) break;
            if (next._type != T_COMMA) return res;
        }
        res.valid = true;
        return res;
    }

    tArgument parse_argument(void)
    {
        tToken id = parser.peek();
        if (id._type == T_FUNC)  return parse_function();
        id = parser.get();
        if (id._type == T_CONST) return id;
        if (id._type == T_VAR)   return id;
        return tArgument(tToken (T_ERROR, id._index));
    }

public:
    void analyze(string input)
    {
        parser.read(input);
        cerr << input << "\n";

        program.left = parse_argument();
        program.valid &= parser.consume(T_EQUAL);
        program.right = parse_argument();
        program.valid &= parser.consume(T_EOI);
    }

    string dump(void)
    {
        return program.dump();
    }
};

int main(int argc, char * argv[])
{
    Compiler compiler;
//  compiler.analyze("f(a, b) = f(g(z, x), g(x, h(x)), c)");
    compiler.analyze(argv[1]);
    cout << compiler.dump();
    return 0;
}

Grammar

Given the rather terse problem definition, I invented a grammar that should at least match the test input:

program : argument = argument
argument: variable
        | constant
        | fun_call
fun_call: fun_name ( arg_list )
arg_list: argument
        | argument , arg_list

Parsing

Given the simplicity of the syntax, parsing is pretty straightforward.
Each character is basically something valid, a space or something invalid.
Spaces are silently consumed, so that the analyzer only gets useful tokens to process.

Analyze

Since I'm doing this barehanded, I simply define a function for each grammatical rule (program, fun_call, arg_list, argument).

The grammar is predictive (can't remember how it's called in posh books, LL1 maybe?) and there are no arithmetic expressions so the code is relatively lightweight.

Error reporting

Bah, just the barest minimum, and I did not really test it.

Proper error handling can easily double code size (even with yacc), so I drew the line early.

Invalid characters will be replaced by "!", and some expected symbols will be pointed at in a semblance of vintage C compilers output.

There are absolutely no re-synchronization attempts, so a typo inside a function call (especially a braces imbalance) will likely cast the rest of the translation unit to the bin.

Using the hard earned syntactic tree

The mighty compiler manages to spit out an equivalent of the input.

Just to show that something was done beside trimming white spaces, variables are preceded by a '$' and constants by a '#' (showing a deplorable lack of imagination).

Sample output

ExpressionCompiler "f(a) = z"
f(a) = z
f($a) = #z

ExpressionCompiler "f(a) = f(c,z)"
f(a) = f(c,z)
f($a) = f($c,#z)

ExpressionCompiler "f(a, b) = f(g(z, x), g(x, h(x)), c)"
f(a, b) = f(g(z, x), g(x, h(x)), c)
f($a,$b) = f(g(#z,#x),g(#x,h(#x)),$c)

ExpressionCompiler "f(a, b) + f(g(z, x), g(x, h(x)), c)"
f(a, b) + f(g(z, x), g(x, h(x)), c)
-------^ expecting =
f($a,$b) = f(g(#z,#x),g(#x,h(#x)),$c)

ExpressionCompiler "f(A, b) = f(g(z, x), g(x, h(x)), c)"
f(A, b) = f(g(z, x), g(x, h(x)), c)
f(!,$b) = f(g(#z,#x),g(#x,h(#x)),$c)

ExpressionCompiler "f(a, b) = K(g(z, x), g(x, h(x)), c)"
f(a, b) = K(g(z, x), g(x, h(x)), c)
----------^ expecting end of input
f($a,$b) = !


来源:https://stackoverflow.com/questions/28681381/spirit-qi-parsing-to-an-abstract-syntax-tree-for-nested-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!