stringstream unsigned input validation

后端 未结 2 1913
旧时难觅i
旧时难觅i 2021-01-13 03:13

I\'m writing part of program which parses and validates some user input in program console arguments. I choose to use stringstream for that purpose, but encounter a problem

相关标签:
2条回答
  • 2021-01-13 03:36

    Version disclaimer: The answer is different for C++03. The following deals with C++11.

    First, let's analyse what's happening.

    ss >> res; This calls std::istream::operator>>(unsigned). In [istream.formatted.arithmetic]/1, the effects are defined as follows:

    These extractors behave as formatted input functions (as described in 27.7.2.2.1). After a sentry object is constructed, the conversion occurs as if performed by the following code fragment:

    typedef num_get< charT,istreambuf_iterator<charT,traits> > numget;
    iostate err = iostate::goodbit;
    use_facet< numget >(loc).get(*this, 0, *this, err, val);
    setstate(err);
    

    In the above fragment, loc stands for the private member of the basic_ios class.

    Following formatted input functions to [istream::sentry], the main effect of the sentry object here is to consume leading white-space characters. It also prevents executing of the code shown above in case of an error (stream is in failed / eof state).

    The used locale is the "C" locale. Rationale:

    For a the stringstream constructed via stringstream ss(s);, the locale of that iostream is the current global locale at the time of construction (that's guaranteed deep down in the rabbit hole at [ios.base.locales]/4). As the global locale hasn't been changed in the OP's program, [locale.cons]/2 specifies the "classic" locale, i.e. the "C" locale.

    use_facet< numget >(loc).get uses the member function num_get<char>::get(iter_type in, iter_type end, ios_base&, ios_base::iostate& err, unsigned int& v) const; specified in [locale.num.get] (note the unsigned int, everything is still fine). The details of the string -> unsigned int conversion for the "C" locale are lengthy and described in [facet.num.get.virtuals]. Some interesting details:

    • For an unsigned integer value, the function strtoull is used.
    • If the conversion fails, ios_base::failbit is assigned to err. Specifically: "The numeric value to be stored can be one of: [...] the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err."

    We need to go to C99, 7.20.1.4 for the definition of strtoull, under paragraph 5:

    If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).

    and under paragraph 8:

    If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno

    It seems that it has been debated in the past if negative values are considered valid input for strotoul. In any case, the problem lies here with this function. A quick check on gcc says that it's considered valid input, and therefore the behaviour you observed.


    Historic note: C++03

    C++03 used scanf inside the num_get conversion. Unfortunately, I'm not quite sure (yet) how the conversion for scanf is specified, and under which circumstances errors occur.


    An explicit error check:

    We can manually insert that check either by using a signed value for conversion and testing <0, or we look for the - character (which isn't a good idea because of possible localization issues).

    0 讨论(0)
  • 2021-01-13 03:58

    A num_get facet to support the explicit check for signedness. Rejects any non-zero number beginning with a '-' (after white-spaces) for unsigned types and uses the default C locale's num_get to do the actual conversion.

    #include <locale>
    #include <istream>
    #include <ios>
    #include <algorithm>
    
    template <class charT, class InputIterator = std::istreambuf_iterator<charT> >
    class num_get_strictsignedness : public std::num_get <charT, InputIterator>
    {
    public:
        typedef charT char_type;
        typedef InputIterator iter_type;
    
        explicit num_get_strictsignedness(std::size_t refs = 0)
            : std::num_get<charT, InputIterator>(refs)
        {}
        ~num_get_strictsignedness()
        {}
    
    private:
        #define DEFINE_DO_GET(TYPE) \
            virtual iter_type do_get(iter_type in, iter_type end,      \
                std::ios_base& str, std::ios_base::iostate& err,       \
                TYPE& val) const override                              \
            {  return do_get_templ(in, end, str, err, val);  }         // MACRO END
    
        DEFINE_DO_GET(unsigned short)
        DEFINE_DO_GET(unsigned int)
        DEFINE_DO_GET(unsigned long)
        DEFINE_DO_GET(unsigned long long)
    
        // not sure if a static locale::id is required..
    
        template <class T>
        iter_type do_get_templ(iter_type in, iter_type end, std::ios_base& str,
                               std::ios_base::iostate& err, T& val) const
        {
            using namespace std;
    
            if(in == end)
            {
                err |= ios_base::eofbit;
                return in;
            }
    
            // leading white spaces have already been discarded by the
            // formatted input function (via sentry's constructor)
    
            // (assuming that) the sign, if present, has to be the first character
            // for the formatting required by the locale used for conversion
    
            // use the "C" locale; could use any locale, e.g. as a data member
    
            // note: the signedness check isn't actually required
            //       (because we only overload the unsigned versions)
            bool do_check = false;
            if(std::is_unsigned<T>{} && *in == '-')
            {
                ++in;  // not required
                do_check = true;
            }
    
            in = use_facet< num_get<charT, InputIterator> >(locale::classic())
                     .get(in, end, str, err, val);
    
            if(do_check && 0 != val)
            {
                err |= ios_base::failbit;
                val = 0;
            }
    
            return in;
        }
    };
    

    Usage example:

    #include <sstream>
    #include <iostream>
    int main()
    {
        std::locale loc( std::locale::classic(),
                         new num_get_strictsignedness<char>() );
        std::stringstream ss("-10");
        ss.imbue(loc);
        unsigned int ui = 42;
        ss >> ui;
        std::cout << "ui = "<<ui << std::endl;
        if(ss)
        {
            std::cout << "extraction succeeded" << std::endl;
        }else
        {
            std::cout << "extraction failed" << std::endl;
        }
    }
    

    Notes:

    • the allocation on the free store is not required, you could use e.g. a (static) local variable where you initialize the ref counter with 1 in the ctor
    • for every character type you want to support (like char, wchar_t, charXY_t), you need to add an own facet (can be different instantiations of the num_get_strictsignedness template)
    • "-0" is accepted
    0 讨论(0)
提交回复
热议问题