StreamReader is too greedy

此生再无相见时 提交于 2020-01-30 08:06:06

问题


I'm trying to process part of a text file, and write the remainder of the text file to a cloud blob using UploadFromStream. The problem is that the StreamReader appears to be grabbing too much content from the underlying stream, and so the subsequent write does nothing.

Text file:

3
Col1,String
Col2,Integer
Col3,Boolean
abc,123,True
def,3456,False
ghijkl,532,True
mnop,1211,False

Code:

using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
using (var reader = new StreamReader(stream))
{
    var numColumns = int.Parse(reader.ReadLine());
    while (numColumns-- > 0)
    {
        var colDescription = reader.ReadLine();
        // do stuff
    }

    // Write remaining contents to another file, for testing
    using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
    {
        stream.CopyTo(destination);
        destination.Flush();
    }

    // Actual intended usage:
    // CloudBlockBlob blob = ...;
    // blob.UploadFromStream(stream);
}

When debugging, I observe that stream.Position jumps to the end of the file on the first call to reader.ReadLine(), which I don't expect. I expected the stream to be advanced only as many positions as the reader needed to read some content.

I imagine that the stream reader is doing some buffering for performance reasons, but there doesn't seem to be a way to ask the reader where in the underlying stream it "really" is. (If there was, I could manually Seek the stream to that position before CopyingTo).

I know that I could keep taking lines using the same reader and sequentially append them to the text file I'm writing, but I'm wondering if there's a cleaner way?

EDIT:

I found a StreamReader constructor which leaves the underlying stream open when it is disposed, so I tried this, hoping that the reader would set the stream's position as it's being disposed:

using (var stream = File.OpenRead("c:\\test\\testinput.txt"))
{
    using (var reader = new StreamReader(stream, Encoding.UTF8, 
        detectEncodingFromByteOrderMarks: true, 
        bufferSize: 1 << 12, 
        leaveOpen: true))
    {
        var numColumns = int.Parse(reader.ReadLine());
        while (numColumns-- > 0)
        {
            var colDescription = reader.ReadLine();
            // do stuff
        }
    }

    // Write remaining contents to another file
    using (var destination = File.OpenWrite("c:\\test\\testoutput.txt"))
    {
        stream.CopyTo(destination);
        destination.Flush();
    }
}

But it doesn't. Why would this constructor be exposed if it doesn't leave the stream in an intuitive state/position?


回答1:


Sure, there's a cleaner way. Use ReadToEnd to read the remaining data, and then write it to a new file. For example:

using (var reader = new StreamReader("c:\\test\\testinput.txt"))
{
    var numColumns = int.Parse(reader.ReadLine());
    while (numColumns-- > 0)
    {
        var colDescription = reader.ReadLine();
        // do stuff
    }

    // write everything else to another file.
    File.WriteAllText("c:\\test\\testoutput.txt", reader.ReadToEnd());
}

Edit after comment

If you want to read the text and upload it to a stream, you could replace the File.WriteAllText with code that reads the remaining text, writes it to a StreamWriter backed by a MemoryStream, and then sends the contents of that MemoryStream. Something like:

    using (var memStream = new MemoryStream())
    {
        using (var writer = new StreamWriter(memStream))
        {
            writer.Write(reader.ReadToEnd());
            writer.Flush();
            memStream.Position = 0;
            blob.UploadFromStream(memStream);
        }
    }



回答2:


You should never access the underlying stream of a StreamReader. Trying to use both is going to have an undefined behavior.

What's going on here is that the reader is buffering the data from the underlying stream. It doesn't read each byte exactly when you request it, because that's often going to be very inefficient. Instead it will grab chunks, put them in a buffer, and then provide you with data from that buffer, grabbing a new chunk when it needs to.

You should continue to use the StreamReader throughout the remainder of that block, instead of using stream. To minimize the memory footprint of the program, the most effective way of doing this would be to read the next line from the reader in a loop until it his the end of the file, writing each line to the output stream as you go.

Also note that you don't need to be disposing of both the stream reader and the underlying stream. The stream reader will dispose of the underlying stream itself, so you can simply adjust your header to:

using (var reader = new StreamReader(
    File.OpenRead("c:\\test\\testinput.txt")))


来源:https://stackoverflow.com/questions/22254585/streamreader-is-too-greedy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!