Boost Spirit Signals Successful Parsing Despite Token Being Incomplete

前端 未结 3 1927
隐瞒了意图╮
隐瞒了意图╮ 2020-12-21 04:44

I have a very simple path construct that I am trying to parse with boost spirit.lex.

We have the following grammar:

token := [a-z]+
path := (token :          


        
相关标签:
3条回答
  • 2020-12-21 05:02

    This is what I finally ended up with. It uses the suggestions from both @sehe and @llonesmiz. Note the conversion to std::wstring and the use of actions in the grammar definition, which were not present in the original post.

    #include <boost/config/warning_disable.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/lex_lexertl.hpp>
    #include <boost/spirit/include/phoenix_operator.hpp>
    #include <boost/bind.hpp>
    
    #include <iostream>
    #include <string>
    
    //
    // This example uses boost spirit to parse a simple
    // colon-delimited grammar.
    //
    // The grammar we want to recognize is:
    //    identifier := [a-z]+
    //    separator = :
    //    path= (identifier separator path) | identifier
    //
    // From the boost spirit perspective this example shows
    // a few things I found hard to come by when building my
    // first parser.
    //    1. How to flag an incomplete token at the end of input
    //       as an error. (use of boost::spirit::eoi)
    //    2. How to bind an action on an instance of an object
    //       that is taken as input to the parser.
    //    3. Use of std::wstring.
    //    4. Use of the lexer iterator.
    //
    
    // This using directive will cause issues with boost::bind
    // when referencing placeholders such as _1.
    // using namespace boost::spirit;
    
    //! A class that tokenizes our input.
    template<typename Lexer>
    struct Tokens : boost::spirit::lex::lexer<Lexer>
    {
          Tokens()
          {
             identifier = L"[a-z]+";
             separator = L":";
    
             this->self.add
                (identifier)
                (separator)
                ;
          }
          boost::spirit::lex::token_def<std::wstring, wchar_t> identifier, separator;
    };
    
    //! This class provides a callback that echoes strings to stderr.
    struct Echo
    {
          void echo(boost::fusion::vector<std::wstring> const& t) const
          {
             using namespace boost::fusion;
             std::wcerr << at_c<0>(t) << "\n";
          }
    };
    
    
    //! The definition of our grammar, as described above.
    template <typename Iterator>
    struct Grammar : boost::spirit::qi::grammar<Iterator> 
    {
          template <typename TokenDef>
          Grammar(TokenDef const& tok, Echo const& e)
             : Grammar::base_type(path)
          {
             using boost::spirit::_val;
             path
                = 
                ((token >> tok.separator >> path)[boost::bind(&Echo::echo, e,::_1)]
                 |
                 (token)[boost::bind(&Echo::echo, &e, ::_1)]
                 ) >> boost::spirit::eoi; // Look for end of input.
    
              token 
                 = (tok.identifier) [_val=boost::spirit::qi::_1]
              ;
    
          }
          boost::spirit::qi::rule<Iterator> path;
          boost::spirit::qi::rule<Iterator, std::wstring()> token;
    };
    
    
    int main()
    {
       // A set of typedefs to make things a little clearer. This stuff is
       // well described in the boost spirit documentation/examples.
       typedef std::wstring::iterator BaseIteratorType;
       typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::wstring> > TokenType;
       typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
       typedef Tokens<LexerType>::iterator_type TokensIterator;
       typedef LexerType::iterator_type LexerIterator;
    
       // Define some paths to parse.
       typedef std::vector<std::wstring> Tests;
       Tests paths;
       paths.push_back(L"abc");
       paths.push_back(L"abc:xyz");
       paths.push_back(L"abc:xyz:");
       paths.push_back(L":");
    
       // Parse 'em.
       for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
       {
          std::wstring str = *iter;
          std::wcerr << L"*****" << str << L"*****\n";
    
          Echo e;
          Tokens<LexerType> tokens;
          Grammar<TokensIterator> grammar(tokens, e);
    
          BaseIteratorType first = str.begin();
          BaseIteratorType last = str.end();
    
          // Have the lexer consume our string.
          LexerIterator lexFirst = tokens.begin(first, last);
          LexerIterator lexLast = tokens.end();
    
          // Have the parser consume the output of the lexer.
          bool r = boost::spirit::qi::parse(lexFirst, lexLast, grammar);
    
          // Print the status and whether or note all output of the lexer 
          // was processed.
          std::wcerr << r << L" " << (lexFirst==lexLast) << L"\n";
       }
    }
    
    0 讨论(0)
  • 2020-12-21 05:18

    The problem lies in the meaning of first and last after your call to tokenize_and_parse. first==last checks if your string has been completely tokenized, you can't infer anything about grammar. If you isolate the parsing like this, you obtain the expected result:

      PathTokens<LexerType> tokens;
      PathGrammar<TokensIterator> grammar(tokens);
    
      BaseIteratorType first = str.begin();
      BaseIteratorType last = str.end();
    
      LexerType::iterator_type lexfirst = tokens.begin(first,last);
      LexerType::iterator_type lexlast = tokens.end();
    
    
      bool r = parse(lexfirst, lexlast, grammar);
    
      std::cerr << r << " " << (lexfirst==lexlast) << "\n";
    
    0 讨论(0)
  • 2020-12-21 05:23

    I addition to to what llonesmiz already said, here's a trick using qi::eoi that I sometimes use:

    path = (
               (token >> tok.separator >> path) [std::cerr << _1 << "\n"]
             | token                           [std::cerr << _1 << "\n"]
        ) >> eoi;
    

    This makes the grammar require eoi (end-of-input) at the end of a successful match. This leads to the desired result:

    http://liveworkspace.org/code/23a7adb11889bbb2825097d7c553f71d

    *****abc*****
    abc
    1 1
    *****abc:xyz*****
    xyz
    abc
    1 1
    *****abc:xyz:*****
    xyz
    abc
    0 1
    
    0 讨论(0)
提交回复
热议问题