Read empty values with boost::spirit

前端 未结 1 753
死守一世寂寞
死守一世寂寞 2021-01-23 19:43

I want to read a CSV into a struct :

struct data 
{
   std::string a;
   std::string b;
   std::string c;
}

However, I want to read even empty

相关标签:
1条回答
  • 2021-01-23 19:55

    You just want to make sure you parse a value for "empty" strings too.

    value = +(char_ - ',' - eol) | attr("(unspecified)");
    entry = value >> ',' >> value >> ',' >> value >> eol;
    

    See the demo:

    Live On Coliru

    //#define BOOST_SPIRIT_DEBUG
    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    
    namespace qi = boost::spirit::qi;
    
    struct data {
        std::string a;
        std::string b;
        std::string c;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(data, (std::string, a)(std::string, b)(std::string, c))
    
    template <typename Iterator, typename skipper = qi::blank_type>
    struct google_parser : qi::grammar<Iterator, data(), skipper> {
        google_parser() : google_parser::base_type(entry, "contacts") {
            using namespace qi;
    
            value = +(char_ - ',' - eol) | attr("(unspecified)");
            entry = value >> ',' >> value >> ',' >> value >> eol;
    
            BOOST_SPIRIT_DEBUG_NODES((value)(entry))
        }
      private:
        qi::rule<Iterator, std::string()> value;
        qi::rule<Iterator, data(), skipper> entry;
    };
    
    int main() {
        using It = std::string::const_iterator;
        google_parser<It> p;
    
        for (std::string input : { 
                "something, awful, is\n",
                "fine,,just\n",
                "like something missing: ,,\n",
            })
        {
            It f = input.begin(), l = input.end();
    
            data parsed;
            bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
    
            if (ok)
                std::cout << "Parsed: '" << parsed.a << "', '" << parsed.b << "', '" << parsed.c << "'\n";
            else
                std::cout << "Parse failed\n";
    
            if (f!=l)
                std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
        }
    }
    

    Prints:

    Parsed: 'something', 'awful', 'is'
    Parsed: 'fine', '(unspecified)', 'just'
    Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'
    

    However, you have a bigger problem. The assumption that qi::repeat(2) [ value ] will parse into 2 strings doesn't work.

    repeat, like operator*, operator+ and operator% parse into a container attribute. In this case the container attribute (string) will receive the input from the second value as well:

    Live On Coliru

    Parsed: 'somethingawful', 'is', ''
    Parsed: 'fine(unspecified)', 'just', ''
    Parsed: 'like something missing: (unspecified)', '(unspecified)', ''
    

    Since this is not what you want, reconsider your data types:

    • either don't adapt the struct but instead write a customization trait to assign the fields (see http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/advanced/customize.html)

    • change the struct to contain a vector of std::string to match the exposed attributes

    • or create an auto-parser generator:

    The auto_ approach:

    If you teach Qi how to extract a single value, you can use a simple rule like

    entry = skip(skipper() | ',') [auto_] >> eol;
    

    This way, Spirit itself will generate the correct number of value extractions for the given Fusion sequence!

    Here's a quick an dirty approach:

    CAVEAT Specializing for std::string directly like this might not be the best idea (it might not always be appropriate and might interact badly with other parsers). However, by default create_parser<std::string> is not defined (because, what would it do?) so I seized the opportunity for the purpose of this demonstration:

    namespace boost { namespace spirit { namespace traits {
        template <> struct create_parser<std::string> {
            typedef proto::result_of::deep_copy<
                BOOST_TYPEOF(
                    qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
                )
            >::type type;
    
            static type call() {
                return proto::deep_copy(
                    qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
                );
            }
        };
    }}}
    

    Again, see the demo output:

    Live On Coliru

    Parsed: 'something', 'awful', 'is'
    Parsed: 'fine', 'just', '(unspecified)'
    Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'
    

    NOTE There was some advanced sorcery to get the skipper to work "just right" (see skip()[] and lexeme[]). Some general explanations can be found here: Boost spirit skipper issues

    UPDATE

    The Container Approach

    There's a subtlety to that. Two actually. So here's a demo:

    Live On Coliru

    //#define BOOST_SPIRIT_DEBUG
    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    
    namespace qi = boost::spirit::qi;
    
    struct data {
        std::vector<std::string> parts;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(data, (std::vector<std::string>, parts))
    
    template <typename Iterator, typename skipper = qi::blank_type>
    struct google_parser : qi::grammar<Iterator, data(), skipper> {
        google_parser() : google_parser::base_type(entry, "contacts") {
            using namespace qi;
            qi::as<std::vector<std::string> > strings;
    
            value = +(char_ - ',' - eol) | attr("(unspecified)");
            entry = strings [ repeat(2) [ value >> ',' ] >> value ] >> eol;
    
            BOOST_SPIRIT_DEBUG_NODES((value)(entry))
        }
      private:
        qi::rule<Iterator, std::string()> value;
        qi::rule<Iterator, data(), skipper> entry;
    };
    
    int main() {
        using It = std::string::const_iterator;
        google_parser<It> p;
    
        for (std::string input : { 
                "something, awful, is\n",
                "fine,,just\n",
                "like something missing: ,,\n",
            })
        {
            It f = input.begin(), l = input.end();
    
            data parsed;
            bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
    
            if (ok) {
                std::cout << "Parsed: ";
                for (auto& part : parsed.parts) 
                    std::cout << "'" << part << "' ";
                std::cout << "\n";
            }
            else
                std::cout << "Parse failed\n";
    
            if (f!=l)
                std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
        }
    }
    

    The subtleties are:

    • adapting a single-element sequence hits edge cases with automatic attribute handling: Spirit Qi attribute propagation issue with single-member struct
    • Spirit needs hand-holding in this particular case to treat the repeat[...]>>value as synthesizing a single container /atomically/. The as<T> directive solves that here
    0 讨论(0)
提交回复
热议问题