问题
I have wrote the following simple test:
[Test]
public void TestUTF8()
{
var c = "abc☰def";
var b = Encoding.UTF8.GetBytes(c);
Assert.That(b.Length, Is.EqualTo(9));
//Assuming, you are reading a byte stream and got partial result with the first 5 bytes
var p = Encoding.UTF8.GetChars(b, 0, 5);
Trace.WriteLine(new string(p));
Assert.That(p.Length, Is.EqualTo(3));
}
The Trace
outputs abc�
and the last assert fails because p.Length
is 4
.
However, I wanted Trace
outputs abc
and the last assert passes, since in reality I know the stream will have valid chars and when it is not the case for the last few bytes, just leave them there waiting for more data to come.
So how can I achieve this in C#?
回答1:
Encoding.GetChars
isn't really designed for bytes coming from a stream where some state needs to be kept track of during the decoding process because a single character might span multiple buffer segments. To do that work you should use a Decoder
obtained from Encoding.GetDecoder
. However, Decoder.Convert
is really low-level allowing you control over both the input and output buffers and somewhat difficult to use. Decoder.GetChars
is somewhat easier to use and does the important work of storing state between calls. We can easily expand on Peter Duniho's answer for arbitrary buffer size:
public static void Main(string[] args)
{
var c = "abc☰def";
var b = Encoding.UTF8.GetBytes(c);
var result = DecodeFromStream(new MemoryStream(b), Encoding.UTF8, 3);
Console.WriteLine(result);
Console.WriteLine(c == result);
}
private static string DecodeFromStream(Stream dataStream, Encoding encoding, int bufferSize)
{
Decoder decoder = encoding.GetDecoder();
StringBuilder sb = new StringBuilder();
int inputByteCount;
byte[] inputBuffer = new byte[bufferSize];
char[] charBuffer = new char[encoding.GetMaxCharCount(inputBuffer.Length)];
while ((inputByteCount = dataStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
{
int readChars = decoder.GetChars(inputBuffer, 0, inputByteCount, charBuffer, 0);
if (readChars > 0)
sb.Append(charBuffer, 0, readChars);
}
return sb.ToString();
}
来源:https://stackoverflow.com/questions/26900642/c-sharp-partial-utf-8-byte-stream-conversion