问题
The abc.txt File Contents are
ABCDEFGHIJ•XYZ
Now, The Character Shown is Fine if I use this code (i.e. Seek to position 9),
string filePath = "D:\\abc.txt";
FileStream fs = new FileStream(filePath, FileMode.Open);
StreamReader sr = new StreamReader(fs, new UTF8Encoding(true), true);
sr.BaseStream.Seek(9, SeekOrigin.Begin);
char[] oneChar = new char[1];
char ch = (char)sr.Read(oneChar, 0, 1);
MessageBox.Show(oneChar[0].ToString());
But if the SEEK position is Just after that Special Dot Character, then I Get Junk Character.
So, I get Junk Character if I do Seek to position 11 (i.e. just after the dot position)
sr.BaseStream.Seek(11, SeekOrigin.Begin);
This should give 'X', because the character at 11th position is X.
I think the File contents are legally UTF8.
There is also one more thing, The StreamReader BaseStream length and the StreamReader Contents Length is different.
MessageBox.Show(sr.BaseStream.Length.ToString());
MessageBox.Show(sr.ReadToEnd().Length.ToString());
回答1:
Why is StreamReader and sr.BaseStream.Seek() giving Junk Characters even in UTF8 Encoding
It is exactly because of UTF-8 that sr.BaseStream
is giving junk characters. :)
StreamReader
is a relatively "smarter" stream. It understands how strings work, whereas FileStream
(i.e. sr.BaseStream
) doesn't. FileStream
only knows about bytes.
Since your file is encoded in UTF-8 (a variable-length encoding), letters like A
, B
and C
are encoded with 1 byte, but the •
character needs 3 bytes. You can get how many bytes a character needs by doing:
Console.WriteLine(Encoding.UTF8.GetByteCount("•"));
So when you move the stream to "the position just after •
", you haven't actually moved past the •
, you are just on the second byte of it.
The reason why the Length
s are different is similar: StreamReader
gives you the number of characters, whereas sr.BaseStream
gives you the number of bytes.
来源:https://stackoverflow.com/questions/60202410/why-is-streamreader-and-sr-basestream-seek-giving-junk-characters-even-in-utf8