How to convert UTF-8 byte[] to string?

后端 未结 15 2342
迷失自我
迷失自我 2020-11-22 03:11

I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.

In some debugging code, I need to convert it to a string. Is

相关标签:
15条回答
  • 2020-11-22 03:44

    Try this console app:

    static void Main(string[] args)
    {
        //Encoding _UTF8 = Encoding.UTF8;
        string[] _mainString = { "Héllo World" };
        Console.WriteLine("Main String: " + _mainString);
    
        //Convert a string to utf-8 bytes.
        byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
    
        //Convert utf-8 bytes to a string.
        string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
        Console.WriteLine("String Unicode: " + _stringuUnicode);
    }
    
    0 讨论(0)
  • 2020-11-22 03:49

    In adition to the selected answer, if you're using .NET35 or .NET35 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:

    string result = System.Text.Encoding.UTF8.GetString(byteArray,0,byteArray.Length);
    
    0 讨论(0)
  • 2020-11-22 03:50

    I saw some answers at this post and it's possible to be considered completed base knowledge, because have a several approaches in C# Programming to resolve the same problem. Only one thing that is necessary to be considered is about a difference between Pure UTF-8 and UTF-8 with B.O.M..

    In last week, at my job, I need to develop one functionality that outputs CSV files with B.O.M. and other CSVs with pure UTF-8 (without B.O.M.), each CSV file Encoding type will be consumed by different non-standardized APIs, that one API read UTF-8 with B.O.M. and the other API read without B.O.M.. I need to research the references about this concept, reading "What's the difference between UTF-8 and UTF-8 without B.O.M.?" Stack Overflow discussion and this Wikipedia link "Byte order mark" to build my approach.

    Finally, my C# Programming for the both UTF-8 encoding types (with B.O.M. and pure) needed to be similar like this example bellow:

    //for UTF-8 with B.O.M., equals shared by Zanoni (at top)
    string result = System.Text.Encoding.UTF8.GetString(byteArray);
    
    //for Pure UTF-8 (without B.O.M.)
    string result = (new UTF8Encoding(false)).GetString(byteArray);
    
    0 讨论(0)
  • 2020-11-22 03:55

    A general solution to convert from byte array to string when you don't know the encoding:

    static string BytesToStringConverted(byte[] bytes)
    {
        using (var stream = new MemoryStream(bytes))
        {
            using (var streamReader = new StreamReader(stream))
            {
                return streamReader.ReadToEnd();
            }
        }
    }
    
    0 讨论(0)
  • 2020-11-22 03:56

    There're at least four different ways doing this conversion.

    1. Encoding's GetString
      , but you won't be able to get the original bytes back if those bytes have non-ASCII characters.

    2. BitConverter.ToString
      The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.

    3. Convert.ToBase64String
      You can easily convert the output string back to byte array by using Convert.FromBase64String.
      Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.

    4. HttpServerUtility.UrlTokenEncode
      You can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.

    A full example:

    byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
    
    string s1 = Encoding.UTF8.GetString(bytes); // ���
    byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
    // decBytes1 not same as bytes
    // Using UTF-8 or other Encoding object will get similar results
    
    string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
    String[] tempAry = s2.Split('-');
    byte[] decBytes2 = new byte[tempAry.Length];
    for (int i = 0; i < tempAry.Length; i++)
        decBytes2[i] = Convert.ToByte(tempAry[i], 16);
    // decBytes2 same as bytes
    
    string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
    byte[] decByte3 = Convert.FromBase64String(s3);
    // decByte3 same as bytes
    
    string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
    byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
    // decBytes4 same as bytes
    
    0 讨论(0)
  • 2020-11-22 03:56

    Alternatively:

     var byteStr = Convert.ToBase64String(bytes);
    
    0 讨论(0)
提交回复
热议问题