StreamWriter and UTF-8 Byte Order Marks

后端 未结 8 780
走了就别回头了
走了就别回头了 2020-11-27 19:02

I\'m having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are

相关标签:
8条回答
  • 2020-11-27 19:43

    My answer is based on HelloSam's one which contains all the necessary information. Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.

    So instead of passing false to UTF8Encoding ctor you need to pass true.

        using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))
    

    Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.

    class Program
    {
        static void Main(string[] args)
        {
            const string nobomtxt = "nobom.txt";
            File.Delete(nobomtxt);
    
            using (Stream stream = File.OpenWrite(nobomtxt))
            using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
            {
                writer.WriteLine("HelloПривет");
            }
    
            const string bomtxt = "bom.txt";
            File.Delete(bomtxt);
    
            using (Stream stream = File.OpenWrite(bomtxt))
            using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
            {
                writer.WriteLine("HelloПривет");
            }
        }
    
    0 讨论(0)
  • 2020-11-27 19:47

    Do you use the same constructor of the StreamWriter for every file? Because the documentation says:

    To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).

    I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)

    0 讨论(0)
  • 2020-11-27 19:50

    As someone pointed that out already, calling without the encoding argument does the trick. However, if you want to be explicit, try this:

    using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))
    

    The key is to construct a new UTF8Encoding(false), instead of using Encoding.UTF8Encoding. That's to control if BOM should be added or not.

    This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.

    0 讨论(0)
  • 2020-11-27 19:50

    Seems that if the file already existed and didn't contain BOM, then it won't contain BOM when overwritten, in other words StreamWriter preserves BOM (or it's absence) when overwriting a file.

    0 讨论(0)
  • 2020-11-27 19:57

    I found this answer useful (thanks to @Philipp Grathwohl and @Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:

    using (FileStream vStream = File.Create(pfilePath))
    {
        // Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
        Encoding vUTF8Encoding = new UTF8Encoding(true);
        // Gets the preamble in order to attach the BOM
        var vPreambleByte = vUTF8Encoding.GetPreamble();
    
        // Writes the preamble first
        vStream.Write(vPreambleByte, 0, vPreambleByte.Length);
    
        // Gets the bytes from text
        byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
        vStream.Write(vByteData, 0, vByteData.Length);
        vStream.Close();
    }
    
    0 讨论(0)
  • 2020-11-27 19:59

    The only time I've seen that constructor not add the UTF-8 BOM is if the stream is not at position 0 when you call it. For example, in the code below, the BOM isn't written:

    using (var s = File.Create("test2.txt"))
    {
        s.WriteByte(32);
        using (var sw = new StreamWriter(s, Encoding.UTF8))
        {
            sw.WriteLine("hello, world");
        }
    }
    

    As others have said, if you're using the StreamWriter(stream) constructor, without specifying the encoding, then you won't see the BOM.

    0 讨论(0)
提交回复
热议问题