问题
Given string foo
, I've written answers on how to use cctype
's tolower to convert the characters to lowercase
transform(cbegin(foo), cend(foo), begin(foo), static_cast<int (*)(int)>(tolower))
But I've begun to consider locale
's tolower, which could be used like this:
use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), foo.size()));
- Is there a reason to prefer one of these over the other?
- Does their functionality differ at all?
- I mean other than the fact that
tolower
accepts and returns anint
which I assume is just some antiquated C stuff?
回答1:
Unfortunately,both are equally bad. Although std::string
pretends to be a utf-8 encoded string, non of the methods/function (including tolower), are really utf-8 aware. So, tolower
/ tolower
+ locale may work with characters which are single byte (= ASCII), they will fail for every other set of languages.
On Linux, I'd use ICU library. On Windows, I'd use CharUpper
function.
回答2:
In the first case (cctype) the locale is set implicitely:
Converts the given character to lowercase according to the character conversion rules defined by the currently installed C locale.
http://en.cppreference.com/w/cpp/string/byte/tolower
In the second (locale's) case you have to explicitely set the locale:
Converts parameter c to its lowercase equivalent if c is an uppercase letter and has a lowercase equivalent, as determined by the ctype facet of locale loc. If no such conversion is possible, the value returned is c unchanged.
http://www.cplusplus.com/reference/locale/tolower/
回答3:
It should be noted that the language designers were aware of cctype
's tolower
when locale
's tolower
was created. It improved in 2 primary ways:
- As is mentioned in progressive_overload's answer the
locale
version allowed the use of thefacet ctype
, even a user modified one, without requiring the shuffling in of a newLC_CTYPE
in viasetlocale
and the restoration of the previousLC_CTYPE
- From section 7.1.6.2[dcl.type.simple]3:
It is implementation-defined whether objects of
char
type are represented as signed or unsigned quantities. Thesigned
specifier forceschar
objects to be signed
Which creates an the potential for undefined behavior with the cctype
version of tolower's if it's argument:
Is not representable as
unsigned char
and does not equalEOF
So there is an additional input and output static_cast
required by the cctype
version of tolower
yielding:
transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });
Since the locale
version operates directly on char
s there is no need for a type conversion.
So if you don't need to perform the conversion in a different facet ctype
it simply becomes a style question of whether you prefer the transform
with a lambda required by the cctype
version, or whether you prefer the locale
version's:
use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));
来源:https://stackoverflow.com/questions/37482246/which-tolower-in-c