boost::spirit::qi Expectation Parser and parser grouping unexpected behaviour

前端 未结 1 1307
余生分开走
余生分开走 2021-01-06 05:13

I\'m hoping someone can shine a light through my ignorance of using the > and >> operators in spirit parsing.

I have a working gra

相关标签:
1条回答
  • 2021-01-06 05:39
    1. Is it worth preventing back-tracking?

      Absolutely. Preventing back tracking in general is a proven way to improve parser performance.

      • reduce the use of (negative) lookahead (operator !, operator - and some operator &)
      • order branches (operator |, operator ||, operator^ and some operator */-/+) such that the most frequent/likely branch is ordered first, or that the most costly branch is tried last

      Using expectation points (>) does not essentially reduce backtracking: it will just disallow it. This will enable targeted error messages, prevent useless 'parsing-into-the-unknown'.

    2. Why do I need the grouping operator ()

      I'm not sure. I had a check using my what_is_the_attr helpers from here

      • ident >> op >> repeat(1,3)[any] >> "->" >> any
        synthesizes attribute:

        fusion::vector4<string, string, vector<string>, string>
        
      • ident >> op > repeat(1,3)[any] > "->" > any
        synthesizes attribute:

        fusion::vector3<fusion::vector2<string, string>, vector<string>, string>
        

      I haven't found the need to group subexpressions using parentheses (things compile), but obviously DataT needs to be modified to match the changed layout.

      typedef boost::tuple<
          boost::tuple<std::string, std::string>, 
          std::vector<std::string>, 
          std::string
      > DataT;
      

    The full code below shows how I'd prefer to do that, using adapted structs.

    1. Does my above example really stop back-tracking after operationRule is matched (I suspect not, it seems that if the entire parser inside the (...) fails backtracking will be allowed)?

      Absolutely. If the expectation(s) is not met, a qi::expectation_failure<> exception is thrown. This by default aborts the parse. You could use qi::on_error to retry, fail, accept or rethrow. The MiniXML example has very good examples on using expectation points with qi::on_error

    2. If the answer to the previous question is /no/, how do I construct a rule that allows backtracking if operation is /not/ matched, but does not allow backtracking once operation /is/ matched?

    3. Why does the grouping operator destroy the attribute grammar - requiring actions?

      It doesn't destroy the attribute grammar, it just changes the exposed type. So, if you bind an appropriate attribute reference to the rule/grammar, you won't need semantic actions. Now, I feel there should be ways to go without the grouping , so let me try it (preferrably on your short selfcontained sample). And indeed I have found no such need. I've added a full example to help you see what is happening in my testing, and not using semantic actions.

    Full Code

    The full code show 5 scenarios:

    • OPTION 1: Original without expectations

      (no relevant changes)

    • OPTION 2: with expectations

      Using the modified typedef for DataT (as shown above)

    • OPTION 3: adapted struct, without expectations

      Using a userdefined struct with BOOST_FUSION_ADAPT_STRUCT

    • OPTION 4: adapted struct, with expectations

      Modifying the adapted struct from OPTION 3

    • OPTION 5: lookahead hack

      This one leverages a 'clever' (?) hack, by making all >> into expectations, and detecting the presence of a operationRule-match beforehand. This is of course suboptimal, but allows you to keep DataT unmodified, and without using semantic actions.

    Obviously, define OPTION to the desired value before compiling.

    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/karma.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/fusion/adapted.hpp>
    #include <iostream>
    
    namespace qi    = boost::spirit::qi; 
    namespace karma = boost::spirit::karma; 
    
    #ifndef OPTION
    #define OPTION 5
    #endif
    
    #if OPTION == 1 || OPTION == 5 // original without expectations (OR lookahead hack)
        typedef boost::tuple<std::string, std::string, std::vector<std::string>, std::string> DataT;
    #elif OPTION == 2 // with expectations
        typedef boost::tuple<boost::tuple<std::string, std::string>, std::vector<std::string>, std::string> DataT;
    #elif OPTION == 3 // adapted struct, without expectations
        struct DataT
        {
            std::string identifier, operation;
            std::vector<std::string> values;
            std::string destination;
        };
    
        BOOST_FUSION_ADAPT_STRUCT(DataT, (std::string, identifier)(std::string, operation)(std::vector<std::string>, values)(std::string, destination));
    #elif OPTION == 4 // adapted struct, with expectations
        struct IdOpT
        {
            std::string identifier, operation;
        };
        struct DataT
        {
            IdOpT idop;
            std::vector<std::string> values;
            std::string destination;
        };
    
        BOOST_FUSION_ADAPT_STRUCT(IdOpT, (std::string, identifier)(std::string, operation));
        BOOST_FUSION_ADAPT_STRUCT(DataT, (IdOpT, idop)(std::vector<std::string>, values)(std::string, destination));
    #endif
    
    template <typename Iterator>
    struct test_parser : qi::grammar<Iterator, DataT(), qi::space_type, qi::locals<char> >
    {
        test_parser() : test_parser::base_type(test, "test")
        {
            using namespace qi;
    
            quoted_string = 
                   omit    [ char_("'\"") [_a =_1] ]             
                >> no_skip [ *(char_ - char_(_a))  ]
                 > lit(_a); 
    
            any_string = quoted_string | +qi::alnum;
    
            identifier = lexeme [ alnum >> *graph ];
    
            operationRule = string("add") | "sub";
            arrow = "->";
    
    #if OPTION == 1 || OPTION == 3   // without expectations
            test = identifier >> operationRule >> repeat(1,3)[any_string] >> arrow >> any_string;
    #elif OPTION == 2 || OPTION == 4 // with expectations
            test = identifier >> operationRule  > repeat(1,3)[any_string]  > arrow  > any_string;
    #elif OPTION == 5                // lookahead hack
            test = &(identifier >> operationRule) > identifier > operationRule > repeat(1,3)[any_string] > arrow > any_string;
    #endif
        }
    
        qi::rule<Iterator, qi::space_type/*, qi::locals<char> */> arrow;
        qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> operationRule;
        qi::rule<Iterator, std::string(), qi::space_type/*, qi::locals<char> */> identifier;
        qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string;
        qi::rule<Iterator, DataT(),       qi::space_type, qi::locals<char> > test;
    };
    
    int main()
    {
        std::string str("addx001 add 'str1'   \"str2\"       ->  \"str3\"");
        test_parser<std::string::const_iterator> grammar;
        std::string::const_iterator iter = str.begin();
        std::string::const_iterator end  = str.end();
    
        DataT data;
        bool r = phrase_parse(iter, end, grammar, qi::space, data);
    
        if (r)
        {
            using namespace karma;
            std::cout << "OPTION " << OPTION << ": " << str << " --> ";
    #if OPTION == 1 || OPTION == 3 || OPTION == 5 // without expectations (OR lookahead hack)
            std::cout << format(delimit[auto_ << auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
    #elif OPTION == 2 || OPTION == 4 // with expectations
            std::cout << format(delimit[auto_ << '[' << auto_ << ']' << " --> " << auto_], data) << "\n";
    #endif
        }
        if (iter!=end)
            std::cout << "Remaining: " << std::string(iter,end) << "\n";
    }
    

    Output for all OPTIONS:

    for a in 1 2 3 4 5; do g++ -DOPTION=$a -I ~/custom/boost/ test.cpp -o test$a && ./test$a; done
    OPTION 1: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
    OPTION 2: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
    OPTION 3: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
    OPTION 4: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
    OPTION 5: addx001 add 'str1'   "str2"       ->  "str3" --> addx001 add [ str1 str2 ]  -->  str3 
    
    0 讨论(0)
提交回复
热议问题