问题
I am trying to create a method that can detect the encoding schema of a text file. I know there are many out there, but I know for sure my text file with be either ASCII
, UTF-8
, or UTF-16
. I only need to detect these three. Anyone know a way to do this?
回答1:
Use the StreamReader to identify the encoding.
Example:
using(var r = new StreamReader(filename, Encoding.Default))
{
richtextBox1.Text = r.ReadToEnd();
var encoding = r.CurrentEncoding;
}
回答2:
First, open the file in binary mode and read it into memory.
For UTF-8 (or ASCII), do a validation check. You can decode the text using Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes)
and catch the exception. If you don't get one, the data is valid UTF-8. Here is the code:
private bool detectUTF8Encoding(string filename)
{
byte[] bytes = File.ReadAllBytes(filename);
try {
Encoding.GetEncoding("UTF-8", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback).GetString(bytes);
return true;
} catch {
return false;
}
}
For UTF-16, check for the BOM (FE FF
or FF FE
, depending on byte order).
来源:https://stackoverflow.com/questions/10522570/determining-text-file-encoding-schema