I\'m writing part of program which parses and validates some user input in program console arguments. I choose to use stringstream for that purpose, but encounter a problem
Version disclaimer: The answer is different for C++03. The following deals with C++11.
First, let's analyse what's happening.
ss >> res;
This calls std::istream::operator>>(unsigned)
. In [istream.formatted.arithmetic]/1, the effects are defined as follows:
These extractors behave as formatted input functions (as described in 27.7.2.2.1). After a sentry object is constructed, the conversion occurs as if performed by the following code fragment:
typedef num_get< charT,istreambuf_iterator<charT,traits> > numget; iostate err = iostate::goodbit; use_facet< numget >(loc).get(*this, 0, *this, err, val); setstate(err);
In the above fragment,
loc
stands for the private member of thebasic_ios
class.
Following formatted input functions to [istream::sentry], the main effect of the sentry
object here is to consume leading white-space characters. It also prevents executing of the code shown above in case of an error (stream is in failed / eof state).
The used locale is the "C"
locale. Rationale:
For a the stringstream
constructed via stringstream ss(s);
, the locale of that iostream is the current global locale at the time of construction (that's guaranteed deep down in the rabbit hole at [ios.base.locales]/4). As the global locale hasn't been changed in the OP's program, [locale.cons]/2 specifies the "classic" locale, i.e. the "C"
locale.
use_facet< numget >(loc).get
uses the member function num_get<char>::get(iter_type in, iter_type end, ios_base&, ios_base::iostate& err, unsigned int& v) const;
specified in [locale.num.get] (note the unsigned int
, everything is still fine). The details of the string -> unsigned int
conversion for the "C" locale are lengthy and described in [facet.num.get.virtuals]. Some interesting details:
strtoull
is used.ios_base::failbit
is assigned to err
. Specifically: "The numeric value to be stored can be one of: [...] the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit
is assigned to err
."We need to go to C99, 7.20.1.4 for the definition of strtoull
, under paragraph 5:
If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).
and under paragraph 8:
If the correct value is outside the range of representable values,
LONG_MIN
,LONG_MAX
,LLONG_MIN
,LLONG_MAX
,ULONG_MAX
, orULLONG_MAX
is returned (according to the return type and sign of the value, if any), and the value of the macroERANGE
is stored inerrno
It seems that it has been debated in the past if negative values are considered valid input for strotoul
. In any case, the problem lies here with this function. A quick check on gcc says that it's considered valid input, and therefore the behaviour you observed.
Historic note: C++03
C++03 used scanf
inside the num_get
conversion. Unfortunately, I'm not quite sure (yet) how the conversion for scanf
is specified, and under which circumstances errors occur.
An explicit error check:
We can manually insert that check either by using a signed value for conversion and testing <0
, or we look for the -
character (which isn't a good idea because of possible localization issues).
A num_get
facet to support the explicit check for signedness. Rejects any non-zero number beginning with a '-'
(after white-spaces) for unsigned types and uses the default C locale's num_get
to do the actual conversion.
#include <locale>
#include <istream>
#include <ios>
#include <algorithm>
template <class charT, class InputIterator = std::istreambuf_iterator<charT> >
class num_get_strictsignedness : public std::num_get <charT, InputIterator>
{
public:
typedef charT char_type;
typedef InputIterator iter_type;
explicit num_get_strictsignedness(std::size_t refs = 0)
: std::num_get<charT, InputIterator>(refs)
{}
~num_get_strictsignedness()
{}
private:
#define DEFINE_DO_GET(TYPE) \
virtual iter_type do_get(iter_type in, iter_type end, \
std::ios_base& str, std::ios_base::iostate& err, \
TYPE& val) const override \
{ return do_get_templ(in, end, str, err, val); } // MACRO END
DEFINE_DO_GET(unsigned short)
DEFINE_DO_GET(unsigned int)
DEFINE_DO_GET(unsigned long)
DEFINE_DO_GET(unsigned long long)
// not sure if a static locale::id is required..
template <class T>
iter_type do_get_templ(iter_type in, iter_type end, std::ios_base& str,
std::ios_base::iostate& err, T& val) const
{
using namespace std;
if(in == end)
{
err |= ios_base::eofbit;
return in;
}
// leading white spaces have already been discarded by the
// formatted input function (via sentry's constructor)
// (assuming that) the sign, if present, has to be the first character
// for the formatting required by the locale used for conversion
// use the "C" locale; could use any locale, e.g. as a data member
// note: the signedness check isn't actually required
// (because we only overload the unsigned versions)
bool do_check = false;
if(std::is_unsigned<T>{} && *in == '-')
{
++in; // not required
do_check = true;
}
in = use_facet< num_get<charT, InputIterator> >(locale::classic())
.get(in, end, str, err, val);
if(do_check && 0 != val)
{
err |= ios_base::failbit;
val = 0;
}
return in;
}
};
Usage example:
#include <sstream>
#include <iostream>
int main()
{
std::locale loc( std::locale::classic(),
new num_get_strictsignedness<char>() );
std::stringstream ss("-10");
ss.imbue(loc);
unsigned int ui = 42;
ss >> ui;
std::cout << "ui = "<<ui << std::endl;
if(ss)
{
std::cout << "extraction succeeded" << std::endl;
}else
{
std::cout << "extraction failed" << std::endl;
}
}
Notes:
1
in the ctorchar
, wchar_t
, charXY_t
), you need to add an own facet (can be different instantiations of the num_get_strictsignedness
template)"-0"
is accepted