问题
I have following test program
char c = '§';
Debug.WriteLine("c: " + (int)c);
byte b = Encoding.GetEncoding(437).GetBytes("§")[0];
Debug.WriteLine("b: " + b);
char c1 = Encoding.GetEncoding(437).GetString(new byte[] { 21 })[0];
Debug.WriteLine("c1: " + (int)c1);
This produces following result:
c: 167
b: 21
c1: 21
As I can see here GetBytes is working correctly
167 in unicode => 21 in CP437
but GetString is not working
21 in CP437 => 21 in unicode
Is this a bug or my mistake?
回答1:
CP437 is not "two-way" for characters in the range 0-31. As stated in the Wikipedia page you linked:
For many uses, the codes in the range 0 to 31 and the code 127 will not produce these symbols. Some (or all) of them will be interpreted as ASCII control characters.
Mapping an Unicode character to a supported CP437 character that is in this range works, but not the other way around. For example, take characters represented by bytes 13 and 10: chances are that if you got them inside a CP437 string, you actually want carriage return and line feed characters to be preserved, and not converted to a bullet and a music note. This behavior is normal: it's not a bug.
回答2:
.net supports two different characters, both of which are (usually) rendered as §
:
char c1 = (char)21;
char c2 = (char)167;
Console.WriteLine(c1 == c2); // prints false
Console.WriteLine(c1); // prints §
Console.WriteLine(c2); // prints §
Character 21 is a special control character, which is rendered as §
when output in text mode.
CP437 allows for 21 to be interpreted as either a control character or as the literal §
. Apparently, GetString
chooses to interpret it as the control character (which is a perfectly valid option), and, thus, maps it to the Unicode control character 21 rather than to the Unicode literal §
.
来源:https://stackoverflow.com/questions/6984171/encoding-getencoding437-getstring-bug