问题
I'm making some pretty string-manipulation-intensive code in C#.NET and got curious about some Joel Spolsky articles I remembered reading a while back:
http://www.joelonsoftware.com/articles/fog0000000319.html
http://www.joelonsoftware.com/articles/Unicode.html
So, how does .NET do it? Two bytes per char? There ARE some Unicode chars^H^H^H^H^H code points that need more than that. And how is the length encoded?
回答1:
Before Jon Skeet turns up here is a link to his excellent blog on strings in C#.
In the current implementation at least, strings take up 20+(n/2)*4 bytes (rounding the value of n/2 down), where n is the number of characters in the string. The string type is unusual in that the size of the object itself varies
回答2:
.NET uses UTF-16.
From System.String on MSDN:
"Each Unicode character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char object."
来源:https://stackoverflow.com/questions/1018915/whats-the-internal-format-of-a-net-string