Determine a string's encoding in C#

前端 未结 9 1935
小鲜肉
小鲜肉 2020-11-22 14:54

Is there any way to determine a string\'s encoding in C#?

Say, I have a filename string, but I don\'t know if it is encoded in Unicode UTF-16 or the

9条回答
  •  情歌与酒
    2020-11-22 15:52

    My solution is to use built-in stuffs with some fallbacks.

    I picked the strategy from an answer to another similar question on stackoverflow but I can't find it now.

    It checks the BOM first using the built-in logic in StreamReader, if there's BOM, the encoding will be something other than Encoding.Default, and we should trust that result.

    If not, it checks whether the bytes sequence is valid UTF-8 sequence. if it is, it will guess UTF-8 as the encoding, and if not, again, the default ASCII encoding will be the result.

    static Encoding getEncoding(string path) {
        var stream = new FileStream(path, FileMode.Open);
        var reader = new StreamReader(stream, Encoding.Default, true);
        reader.Read();
    
        if (reader.CurrentEncoding != Encoding.Default) {
            reader.Close();
            return reader.CurrentEncoding;
        }
    
        stream.Position = 0;
    
        reader = new StreamReader(stream, new UTF8Encoding(false, true));
        try {
            reader.ReadToEnd();
            reader.Close();
            return Encoding.UTF8;
        }
        catch (Exception) {
            reader.Close();
            return Encoding.Default;
        }
    }
    

提交回复
热议问题