Cannot get Boost Spirit grammar to use known keys for std::map<>

前端 未结 1 1002
暗喜
暗喜 2020-12-20 03:09

I seem to be experiencing some mental block with Boost Spirit I just cannot get by. I have a fairly simple grammar I need to handle, where I would like to put the values int

相关标签:
1条回答
  • 2020-12-20 03:40

    Notes:

    1. with this

              add_     = ( ( "add"    >> attr( ' ' ) ) [ _val = "add" ] );
              modify_  = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
              clear_   = ( ( "clear"  >> attr( ' ' ) ) [ _val = "clear" ] );
      

      did you mean to require a space? Or are you really just trying to force the struct action field to contain a trailing space (that's what will happen).

      If you meant the latter, I'd do that outside of the parser¹.

      If you wanted the first, use the kw facility:

              add_    = kw["add"]    [ _val = "add"    ];
              modify_ = kw["modify"] [ _val = "modify" ];
              clear_  = kw["clear"]  [ _val = "clear"  ];
      

      In fact, you can simplify that (again, ¹):

              add_    = raw[ kw["add"] ];
              modify_ = raw[ kw["modify"] ];
              clear_  = raw[ kw["clear"] ];
      

      Which also means that you can simplify to

              action_  = raw[ kw[lit("add")|"modify"|"clear"] ];
      

      However, getting a bit close to your question, you could also use a symbol parser:

              symbols<char> action_sym;
              action_sym += "add", "modify", "clear";
              //
              action_  = raw[ kw[action_sym] ];
      

      Caveat: the symbols needs to be a member so its lifetime extends beyond the constructor.

    2. If you meant to capture the input representation of ipv4 addresses with

              ipv4     =  +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                  >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                  >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                  >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
      

      Side note I'm assuming +as_string is a simple mistake and you meant as_string instead.

      Simplify:

          qi::uint_parser<uint8_t, 10, 1, 3> octet;
      

      This obviates the range checks (see ¹ again):

          ipv4 = as_string[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
      

      However, this would build a 4-char binary string representation of the address. If you wanted that, fine. I doubt it (because you'd have written std::array<uint8_t, 4> or uint64_t, right?). So if you wanted the string, again use raw[]:

          ipv4     = raw[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
      
    3. Same issue as with number 1.:

          pair     =  identity >> -( attr(' ') >> value );
      

      This time, the problem betrays that the productions should not be in token; Conceptually token-izing precedes parsing and hence I'd keep the tokens skipper-less. kw doesn't really do a lot of good in that context. Instead, I'd move pair, map and list (unused?) into the parser:

          pair     =  kw[identity] >> -value;
          map      =  +pair;
          list     =  *value;
      

    Some examples

    There's a very recent example I made about using symbols to parse (here), but this answer comes a lot closer to your question:

    • How to provider user with autocomplete suggestions for given boost::spirit grammar?

    It goes far beyond the scope of your parser because it does all kinds of actions in the grammar, but what it does show is to have generic "lookup-ish" rules that can be parameterized with a particular "symbol set": see the Identifier Lookup section of the answer:

    Identifier Lookup

    We store "symbol tables" in Domain members _variables and _functions:

          using Domain = qi::symbols<char>;           Domain _variables, _functions;
    

    Then we declare some rules that can do lookups on either of them:

          // domain identifier lookups
          qi::_r1_type _domain;
          qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known,
    

    unknown;

    The corresponding declarations will be shown shortly.

    Variables are pretty simple:

          variable   = maybe_known(phx::ref(_variables));
    

    Calls are trickier. If a name is unknown we don't want to assume it implies a function unless it's followed by a '(' character. However, if an identifier is a known function name, we want even to imply the ( (this gives the UX the appearance of autocompletion where when the user types sqrt, it suggests the next character to be ( magically).

          // The heuristics:          // - an unknown identifier followed by (
          // - an unclosed argument list implies )            call %= (
    

    known(phx::ref(_functions)) // known -> imply the parens | &(identifier >> '(') >> unknown(phx::ref(_functions)) ) >> implied('(') >> -(expression % ',') >> implied(')');

    It all builds on known, unknown and maybe_known:

              ///////////////////////////////
              // identifier loopkup, suggesting
              {
                  maybe_known = known(_domain) | unknown(_domain);
    
                  // distinct to avoid partially-matching identifiers
                  using boost::spirit::repository::qi::distinct;
                  auto kw     = distinct(copy(alnum | '_'));
    
                  known       = raw[kw[lazy(_domain)]];
                  unknown     = raw[identifier[_val=_1]] [suggest_for(_1, _domain)];
              }
    

    I think you can use the same approach constructively here. One additional gimmick could be to validate that properties supplied are, in fact, unique.

    Demo Work

    Combining all the hints above makes it compile and "parse" the test commands:

    Live On Coliru

    #include <string>
    #include <map>
    #include <vector>
    
    namespace ast {
    
        //
        using string  = std::string;
        using strings = std::vector<string>;
        using list    = strings;
        using pair    = std::pair<string, string>;
        using map     = std::map<string, string>;
    
        //
        struct command {
            string host;
            string action;
            map option;
        };
    }
    
    #include <boost/fusion/adapted.hpp>
    
    BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
    
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/repository/include/qi_distinct.hpp>
    
    namespace grammar
    {
        namespace qi = boost::spirit::qi;
        namespace qr = boost::spirit::repository::qi;
    
        template <typename It>
        struct parser
        {
            struct skip : qi::grammar<It> {
    
                skip() : skip::base_type(text) {
                    using namespace qi;
    
                    // handle all whitespace along with line/block comments
                    text = ascii::space
                        | (lit("#")|"--"|"//") >> *(char_ - eol)  >> (eoi | eol) // line comment
                        | "/*" >> *(char_ - "*/") >> "*/";         // block comment
    
                    //
                    BOOST_SPIRIT_DEBUG_NODES((text))
                }
    
              private:
                qi::rule<It> text;
            };
            //
            struct token {
                //
                token() {
                    using namespace qi;
    
                    // common
                    string   = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
                    identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
                    value    = string | identity;
    
                    // ip target
                    any      = '*';
                    local    = '.' | fqdn;
                    fqdn     = +char_("a-zA-Z0-9.\\-"); // concession
    
                    ipv4     = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
                    //
                    target   = any | local | fqdn | ipv4;
    
                    //
                    BOOST_SPIRIT_DEBUG_NODES(
                            (string) (identity) (value)
                            (any) (local) (fqdn) (ipv4) (target)
                       )
                }
    
              protected:
                //
                qi::rule<It, std::string()> string;
                qi::rule<It, std::string()> identity;
                qi::rule<It, std::string()> value;
                qi::uint_parser<uint8_t, 10, 1, 3> octet;
    
                qi::rule<It, std::string()> any;
                qi::rule<It, std::string()> local;
                qi::rule<It, std::string()> fqdn;
                qi::rule<It, std::string()> ipv4;
                qi::rule<It, std::string()> target;
            };
    
            //
            struct test : token, qi::grammar<It, ast::command(), skip> {
                //
                test() : test::base_type(command_)
                {
                    using namespace qi;
    
                    auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
    
                    //
                    action_sym += "add", "modify", "clear";
                    action_  = raw[ kw[action_sym] ];
    
                    //
                    command_ =  kw["test"]
                            >> target
                            >> action_
                            >> '(' >> map >> ')'
                            >> ';';
    
                    //
                    pair     = kw[identity] >> -value;
                    map      = +pair;
                    list     = *value;
    
                    BOOST_SPIRIT_DEBUG_NODES(
                            (command_) (action_)
                            (pair) (map) (list)
                        )
                }
    
              private:
                using token::target;
                using token::identity;
                using token::value;
                qi::symbols<char> action_sym;
    
                //
                qi::rule<It, ast::command(), skip> command_;
                qi::rule<It, std::string(), skip> action_;
    
                //
                qi::rule<It, ast::map(), skip>  map;
                qi::rule<It, ast::pair(), skip> pair;
                qi::rule<It, ast::list(), skip> list;
            };
    
        };
    }
    
    #include <fstream>
    
    int main() {
        using It = boost::spirit::istream_iterator;
        using Parser = grammar::parser<It>;
    
        std::ifstream input("input.txt");
        It f(input >> std::noskipws), l;
    
        Parser::skip const s{};
        Parser::test const p{};
    
        std::vector<ast::command> data;
        bool ok = phrase_parse(f, l, *p, s, data);
    
        if (ok) {
            std::cout << "Parsed " << data.size() << " commands\n";
        } else {
            std::cout << "Parsed failed\n";
        }
    
        if (f != l) {
            std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
        }
    }
    

    Prints

    Parsed 3 commands
    

    Let's restrict the Keys

    Like in the linked answer above, let's pass the map, pair rules the actual key set to get their allowed values from:

        using KeySet = qi::symbols<char>;
        using KeyRef  = KeySet const*;
        //
        KeySet add_keys, modify_keys, clear_keys;
        qi::symbols<char, KeyRef> action_sym;
    
        qi::rule<It, ast::pair(KeyRef),   skip> pair;
        qi::rule<It, ast::map(KeyRef),    skip> map;
    

    Note A key feature used is the associated attribute value with a symbols<> lookup (in this case we associate a KeyRef with an action symbol):

        //
        add_keys    += "a1", "a2", "a3", "a4", "a5", "a6";
        modify_keys += "m1", "m2", "m3", "m4";
        clear_keys  += "c1", "c2", "c3", "c4", "c5";
    
        action_sym.add
          ("add", &add_keys)
          ("modify", &modify_keys)
          ("clear", &clear_keys);
    

    Now the heavy lifting starts.

    Using qi::locals<> and inherited attributes

    Let's give command_ some local space to store the selected keyset:

      qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
    

    Now we can in principle assignt to it (using the _a placeholder). However, there's some details:

        //
        qi::_a_type selected;
    

    Always prefer descriptive names :) _a and _r1 get old pretty quick. Things are confusing enough as it is.

        command_ %= kw["test"]
                >> target
                >> raw[ kw[action_sym] [ selected = _1 ] ]
                >> '(' >> map(selected) >> ')'
                >> ';';
    

    Note: the subtlest detail here is %= instead of = to avoid the suppression of automatic attribute propagation when a semantic action is present (yeah, see ¹ again...)

    But all in all, that doesn't read so bad?

        //
        qi::_r1_type symref;
        pair     = raw[ kw[lazy(*symref)] ] >> -value;
        map      = +pair(symref);
    

    And now at least things parse

    Almost there

    Live On Coliru

    //#define BOOST_SPIRIT_DEBUG
    #include <string>
    #include <map>
    #include <vector>
    
    namespace ast {
    
        //
        using string  = std::string;
        using strings = std::vector<string>;
        using list    = strings;
        using pair    = std::pair<string, string>;
        using map     = std::map<string, string>;
    
        //
        struct command {
            string host;
            string action;
            map option;
        };
    }
    
    #include <boost/fusion/adapted.hpp>
    
    BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
    
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/repository/include/qi_distinct.hpp>
    
    namespace grammar
    {
        namespace qi = boost::spirit::qi;
        namespace qr = boost::spirit::repository::qi;
    
        template <typename It>
        struct parser
        {
            struct skip : qi::grammar<It> {
    
                skip() : skip::base_type(rule_) {
                    using namespace qi;
    
                    // handle all whitespace along with line/block comments
                    rule_ = ascii::space
                        | (lit("#")|"--"|"//") >> *(char_ - eol)  >> (eoi | eol) // line comment
                        | "/*" >> *(char_ - "*/") >> "*/";         // block comment
    
                    //
                    //BOOST_SPIRIT_DEBUG_NODES((skipper))
                }
    
              private:
                qi::rule<It> rule_;
            };
            //
            struct token {
                //
                token() {
                    using namespace qi;
    
                    // common
                    string   = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
                    identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
                    value    = string | identity;
    
                    // ip target
                    any      = '*';
                    local    = '.' | fqdn;
                    fqdn     = +char_("a-zA-Z0-9.\\-"); // concession
    
                    ipv4     = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
                    //
                    target   = any | local | fqdn | ipv4;
    
                    //
                    BOOST_SPIRIT_DEBUG_NODES(
                            (string) (identity) (value)
                            (any) (local) (fqdn) (ipv4) (target)
                       )
                }
    
              protected:
                //
                qi::rule<It, std::string()> string;
                qi::rule<It, std::string()> identity;
                qi::rule<It, std::string()> value;
                qi::uint_parser<uint8_t, 10, 1, 3> octet;
    
                qi::rule<It, std::string()> any;
                qi::rule<It, std::string()> local;
                qi::rule<It, std::string()> fqdn;
                qi::rule<It, std::string()> ipv4;
                qi::rule<It, std::string()> target;
            };
    
            //
            struct test : token, qi::grammar<It, ast::command(), skip> {
                //
                test() : test::base_type(start_)
                {
                    using namespace qi;
    
                    auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
    
                    //
                    add_keys    += "a1", "a2", "a3", "a4", "a5", "a6";
                    modify_keys += "m1", "m2", "m3", "m4";
                    clear_keys  += "c1", "c2", "c3", "c4", "c5";
    
                    action_sym.add
                      ("add", &add_keys)
                      ("modify", &modify_keys)
                      ("clear", &clear_keys);
    
                    //
                    qi::_a_type selected;
    
                    command_ %= kw["test"]
                            >> target
                            >> raw[ kw[action_sym] [ selected = _1 ] ]
                            >> '(' >> map(selected) >> ')'
                            >> ';';
    
                    //
                    qi::_r1_type symref;
                    pair     = raw[ kw[lazy(*symref)] ] >> -value;
                    map      = +pair(symref);
                    list     = *value;
    
                    start_   = command_;
    
                    BOOST_SPIRIT_DEBUG_NODES(
                            (start_) (command_)
                            (pair) (map) (list)
                        )
                }
    
              private:
                using token::target;
                using token::identity;
                using token::value;
    
                using KeySet = qi::symbols<char>;
                using KeyRef  = KeySet const*;
    
                //
                qi::rule<It, ast::command(), skip> start_;
                qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
    
                //
                KeySet add_keys, modify_keys, clear_keys;
                qi::symbols<char, KeyRef> action_sym;
    
                qi::rule<It, ast::pair(KeyRef),   skip> pair;
                qi::rule<It, ast::map(KeyRef),    skip> map;
                qi::rule<It, ast::list(),         skip> list;
            };
    
        };
    }
    
    #include <fstream>
    
    int main() {
        using It = boost::spirit::istream_iterator;
        using Parser = grammar::parser<It>;
    
        std::ifstream input("input.txt");
        It f(input >> std::noskipws), l;
    
        Parser::skip const s{};
        Parser::test const p{};
    
        std::vector<ast::command> data;
        bool ok = phrase_parse(f, l, *p, s, data);
    
        if (ok) {
            std::cout << "Parsed " << data.size() << " commands\n";
        } else {
            std::cout << "Parsed failed\n";
        }
    
        if (f != l) {
            std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
        }
    }
    

    Prints

    Parsed 3 commands
    

    HOLD ON, NOT SO FAST! It's wrong

    Yeah. If you enable debug, you'll see it parses things oddly:

     <attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], [c, 2]], [[c, 3], []]]]]</attributes>
    

    This is actually "merely" a problem with the grammar. If the grammar cannot see the difference between a key and value then obviously c2 is going to be parsed as the value of property with key c1.

    It's up to you to disambiguate the grammar. For now, I'm going to demonstrate a fix using a negative assertion: we only accept values that are not known keys. It's a bit dirty, but might be useful to you for instructional purposes:

        key      = raw[ kw[lazy(*symref)] ];
        pair     = key(symref) >> -(!key(symref) >> value);
        map      = +pair(symref);
    

    Note I factored out the key rule for readability:

    Live On Coliru

    Parses

    <attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], []], [[c, 2], []], [[c, 3], []]]]]</attributes>
    

    Just what the doctor ordered!


    ¹ Boost Spirit: "Semantic actions are evil"?

    0 讨论(0)
提交回复
热议问题