How does s[i]^=32 convert upper to lower case?

孤街浪徒 提交于 2019-12-10 03:32:00

问题


int main()
{
    string s;
    cout << "enter the string :" << endl;
    cin >> s;
    for (int i = 0; i < s.length(); i++)
        s[i] ^= 32;
    cout << "modified string is : " << s << endl;
    return 0;
}

I saw this code which converts uppercase to lowercase on stackoverflow.

But I don't understand the line s[i] = s[i]^32.

How does it work?


回答1:


^= is the exclusive-or assignment operator. 32 is 100000 in binary, so ^= 32 switches the fifth bit in the destination. In ASCII, lower and upper case letters are 32 positions apart, so this converts lower to upper case, and also the other way.

But it only works for ASCII, not for Unicode for example, and only for letters. To write portable C++, you should not assume the character encoding to be ASCII, so please don't use such code. @πάντα ῥεῖs answer shows a way to do it properly.




回答2:


How does it work?

Let's see for ASCII value 'A':

'A' is binary 1000001

XORed with 32 (binary 100000)

yields any value where the upper character indicating bit isn't set:

1000001 XOR 100000 = 1100001 == 'a' in ASCII.


Any sane and portable c or c++ application should use tolower():

int main()
{
    string s;
    cout<<"enter the string :"<<endl;
    cin>>s;
    for (int i=0;i<s.length();i++) s[i] = tolower( (unsigned char)s[i] );
                                     // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    cout<<"modified string is : "<<s<<endl;
    return 0;
}

The s[i]=s[i]^32 (cargo cult) magic, relies on ASCII table specific mapping to numeric char values.

There are other char code tables like e.g. EBCDIC , where the

 s[i]=s[i]^32

method miserably fails to retrieve the corresponding lower case letters.


There's a more sophisticated c++ version of converting to lower case characters shown in the reference documentation page of std::ctype::tolower().




回答3:


In C++, like its predecessor C, a char is a numeric type. This is after all how characters are represented on the hardware and these languages don't hide that from you.

In ASCII, letters have the useful property that the difference between an uppercase and a lowercase letter is a single binary bit: the 5th bit (if we start numbering from the right starting at 0).

Uppercase A is represented by the byte 0b01000001 (0x41 in hex), and lowercase a is represented by the byte 0b01100001 (0x61 in hex). Notice that the only difference between uppercase and lowercase A is the fifth bit. This pattern continues from B to Z.

So, when you do ^= 32 (which, incidentally, is 2 to the 5th power) on a number that represents an ASCII character, what that does is toggle the 5th bit - if it is 0, it becomes 1, and vice versa, which changes the character from upper to lower case and vice versa.



来源:https://stackoverflow.com/questions/40641468/how-does-si-32-convert-upper-to-lower-case

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!