How to decode an utf8 encoded string split in two buffers right in between a 4 byte long char?

后端 未结 1 947
清歌不尽
清歌不尽 2021-01-21 02:08

A character in UTF8 encoding has up to 4 bytes. Now imagine I read from a stream into one buffer and then into the another. Unfortunately it just happens to be that at the end o

相关标签:
1条回答
  • 2021-01-21 02:50

    You should use a Decoder, which is able to maintain state between calls to GetChars - it remembers the bytes it hasn't decoded yet.

    using System;
    using System.Text;
    
    class Test
    {
        static void Main()
        {
            string str = "Hello\u263AWorld";
    
            var bytes = Encoding.UTF8.GetBytes(str);
            var decoder = Encoding.UTF8.GetDecoder();
    
            // Long enough for the whole string
            char[] buffer = new char[100];
    
            // Convert the first "packet"
            var length1 = decoder.GetChars(bytes, 0, 6, buffer, 0);
            // Convert the second "packet", writing into the buffer
            // from where we left off
            // Note: 6 not 7, because otherwise we're skipping a byte...
            var length2 = decoder.GetChars(bytes, 6, bytes.Length - 6,
                                           buffer, length1);
            var reconstituted = new string(buffer, 0, length1 + length2);
            Console.WriteLine(str == reconstituted); // true        
        }
    }
    
    0 讨论(0)
提交回复
热议问题