Encoding.GetEncoding(437).GetString() bug?

前端 未结 2 570
被撕碎了的回忆
被撕碎了的回忆 2021-01-17 23:47

I have following test program

char c = \'§\';
Debug.WriteLine(\"c: \" + (int)c);

byte b = Encoding.GetEncoding(437).GetBytes(\"§\")[0];
Debug.WriteLine(\"b:         


        
相关标签:
2条回答
  • 2021-01-18 00:21

    .net supports two different characters, both of which are (usually) rendered as §:

    char c1 = (char)21;
    char c2 = (char)167;
    
    Console.WriteLine(c1 == c2);  // prints false
    Console.WriteLine(c1);        // prints §
    Console.WriteLine(c2);        // prints §
    

    Character 21 is a special control character, which is rendered as § when output in text mode.

    CP437 allows for 21 to be interpreted as either a control character or as the literal §. Apparently, GetString chooses to interpret it as the control character (which is a perfectly valid option), and, thus, maps it to the Unicode control character 21 rather than to the Unicode literal §.

    0 讨论(0)
  • 2021-01-18 00:40

    CP437 is not "two-way" for characters in the range 0-31. As stated in the Wikipedia page you linked:

    For many uses, the codes in the range 0 to 31 and the code 127 will not produce these symbols. Some (or all) of them will be interpreted as ASCII control characters.

    Mapping an Unicode character to a supported CP437 character that is in this range works, but not the other way around. For example, take characters represented by bytes 13 and 10: chances are that if you got them inside a CP437 string, you actually want carriage return and line feed characters to be preserved, and not converted to a bullet and a music note. This behavior is normal: it's not a bug.

    0 讨论(0)
提交回复
热议问题