问题
int main()
{
string s;
cout << "enter the string :" << endl;
cin >> s;
for (int i = 0; i < s.length(); i++)
s[i] ^= 32;
cout << "modified string is : " << s << endl;
return 0;
}
I saw this code which converts uppercase to lowercase on stackoverflow.
But I don't understand the line s[i] = s[i]^32
.
How does it work?
回答1:
^=
is the exclusive-or assignment operator. 32 is 100000 in binary, so ^= 32
switches the fifth bit in the destination. In ASCII, lower and upper case letters are 32 positions apart, so this converts lower to upper case, and also the other way.
But it only works for ASCII, not for Unicode for example, and only for letters. To write portable C++, you should not assume the character encoding to be ASCII, so please don't use such code. @πάντα ῥεῖs answer shows a way to do it properly.
回答2:
How does it work?
Let's see for ASCII value 'A'
:
'A'
is binary 1000001
XORed with 32 (binary 100000
)
yields any value where the upper character indicating bit isn't set:
1000001
XOR
100000
= 1100001
== 'a'
in ASCII.
Any sane and portable c or c++ application should use tolower():
int main()
{
string s;
cout<<"enter the string :"<<endl;
cin>>s;
for (int i=0;i<s.length();i++) s[i] = tolower( (unsigned char)s[i] );
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cout<<"modified string is : "<<s<<endl;
return 0;
}
The s[i]=s[i]^32
(cargo cult) magic, relies on ASCII table specific mapping to numeric char
values.
There are other char
code tables like e.g. EBCDIC
, where the
s[i]=s[i]^32
method miserably fails to retrieve the corresponding lower case letters.
There's a more sophisticated c++ version of converting to lower case characters shown in the reference documentation page of std::ctype::tolower().
回答3:
In C++, like its predecessor C, a char
is a numeric type. This is after all how characters are represented on the hardware and these languages don't hide that from you.
In ASCII, letters have the useful property that the difference between an uppercase and a lowercase letter is a single binary bit: the 5th bit (if we start numbering from the right starting at 0).
Uppercase A is represented by the byte 0b01000001
(0x41
in hex), and lowercase a is represented by the byte 0b01100001
(0x61
in hex). Notice that the only difference between uppercase and lowercase A is the fifth bit. This pattern continues from B to Z.
So, when you do ^= 32
(which, incidentally, is 2 to the 5th power) on a number that represents an ASCII character, what that does is toggle the 5th bit - if it is 0, it becomes 1, and vice versa, which changes the character from upper to lower case and vice versa.
来源:https://stackoverflow.com/questions/40641468/how-does-si-32-convert-upper-to-lower-case