Why is StreamReader and sr.BaseStream.Seek() giving Junk Characters even in UTF8 Encoding

妖精的绣舞 提交于 2020-05-16 08:14:46

问题


The abc.txt File Contents are

ABCDEFGHIJ•XYZ

Now, The Character Shown is Fine if I use this code (i.e. Seek to position 9),

            string filePath = "D:\\abc.txt";
            FileStream fs = new FileStream(filePath, FileMode.Open);
            StreamReader sr = new StreamReader(fs, new UTF8Encoding(true), true);
            sr.BaseStream.Seek(9, SeekOrigin.Begin);
            char[] oneChar = new char[1];
            char ch = (char)sr.Read(oneChar, 0, 1);
            MessageBox.Show(oneChar[0].ToString());

But if the SEEK position is Just after that Special Dot Character, then I Get Junk Character.

So, I get Junk Character if I do Seek to position 11 (i.e. just after the dot position)

sr.BaseStream.Seek(11, SeekOrigin.Begin);

This should give 'X', because the character at 11th position is X.

I think the File contents are legally UTF8.

There is also one more thing, The StreamReader BaseStream length and the StreamReader Contents Length is different.

   MessageBox.Show(sr.BaseStream.Length.ToString());
   MessageBox.Show(sr.ReadToEnd().Length.ToString());

回答1:


Why is StreamReader and sr.BaseStream.Seek() giving Junk Characters even in UTF8 Encoding

It is exactly because of UTF-8 that sr.BaseStream is giving junk characters. :)

StreamReader is a relatively "smarter" stream. It understands how strings work, whereas FileStream (i.e. sr.BaseStream) doesn't. FileStream only knows about bytes.

Since your file is encoded in UTF-8 (a variable-length encoding), letters like A, B and C are encoded with 1 byte, but the character needs 3 bytes. You can get how many bytes a character needs by doing:

Console.WriteLine(Encoding.UTF8.GetByteCount("•"));

So when you move the stream to "the position just after ", you haven't actually moved past the , you are just on the second byte of it.

The reason why the Lengths are different is similar: StreamReader gives you the number of characters, whereas sr.BaseStream gives you the number of bytes.



来源:https://stackoverflow.com/questions/60202410/why-is-streamreader-and-sr-basestream-seek-giving-junk-characters-even-in-utf8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!