I have a byte[]
array that is loaded from a file that I happen to known contains UTF-8.
In some debugging code, I need to convert it to a string. Is
Try this console app:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Héllo World" };
Console.WriteLine("Main String: " + _mainString);
//Convert a string to utf-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
//Convert utf-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
In adition to the selected answer, if you're using .NET35 or .NET35 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:
string result = System.Text.Encoding.UTF8.GetString(byteArray,0,byteArray.Length);
I saw some answers at this post and it's possible to be considered completed base knowledge, because have a several approaches in C# Programming to resolve the same problem. Only one thing that is necessary to be considered is about a difference between Pure UTF-8 and UTF-8 with B.O.M..
In last week, at my job, I need to develop one functionality that outputs CSV files with B.O.M. and other CSVs with pure UTF-8 (without B.O.M.), each CSV file Encoding type will be consumed by different non-standardized APIs, that one API read UTF-8 with B.O.M. and the other API read without B.O.M.. I need to research the references about this concept, reading "What's the difference between UTF-8 and UTF-8 without B.O.M.?" Stack Overflow discussion and this Wikipedia link "Byte order mark" to build my approach.
Finally, my C# Programming for the both UTF-8 encoding types (with B.O.M. and pure) needed to be similar like this example bellow:
//for UTF-8 with B.O.M., equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);
A general solution to convert from byte array to string when you don't know the encoding:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
There're at least four different ways doing this conversion.
Encoding's GetString
, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString
The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String
You can easily convert the output string back to byte array by using Convert.FromBase64String
.
Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncode
You can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode
. The output string is already URL friendly! The downside is it needs System.Web
assembly if your project is not a web project.
A full example:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
Alternatively:
var byteStr = Convert.ToBase64String(bytes);