How should I decode a UTF-8 string

前端 未结 3 1468
清歌不尽
清歌不尽 2021-01-12 20:19

I have a string like:

About \\xee\\x80\\x80John F Kennedy\\xee\\x80\\x81\\xe2\\x80\\x99s Assassination . unsolved my         


        
3条回答
  •  有刺的猬
    2021-01-12 20:45

    If you have a string like that, then you have used the wrong encoding when you decoded it in the first place. There is no "UTF-8 string", the UTF-8 data is whent the text is encoded into binary data (bytes). When it's decoded into a string, then it's not UTF-8 any more.

    You should use the UTF-8 encoding when you create the string from binary data, once the string is created using the wrong encoding, you can't reliably fix it.

    If there is no other alternative, you could try to fix the string by encoding it again using the same wrong encoding that was used to create it, and then decode it using the corrent encoding. There is however no guarantee that this will work for all strings, some characters will simply be lost during the wrong decoding. Example:

    // wrong use of encoding, to try to fix wrong decoding
    str = Encoding.UTF8.GetString(Encoding.Default.GetBytes(str));
    

提交回复
热议问题