Stitching together multiple streams in one Stream class

后端 未结 1 1266
忘掉有多难
忘掉有多难 2021-01-14 23:39

I want to make a class (let\'s call the class HugeStream) that takes an IEnumerable> in its constructor. This HugeStream should implement

相关标签:
1条回答
  • 2021-01-14 23:44

    If you only read data sequentially from the HugeStream, then it simply needs to read each child stream (and append it into a local file as well as returning the read data to the caller) until the child-stream is exhausted, then move on to the next child-stream. If a Seek operation is used to jump "backwards" in the data, you must start reading from the local cache file; when you reach the end of the cache file, you must resume reading the current child stream where you left off.

    So far, this is all pretty straight-forward to implement - you just need to indirect the Read calls to the appropriate stream, and switch streams as each one runs out of data.

    The inefficiency of the quoted article is that it runs through all the streams every time you read to work out where to continue reading from. To improve on this, you need to open the child streams only as you need them, and keep track of the currently-open stream so you can just keep reading more data from that current stream until it is exhausted. Then open the next stream as your "current" stream and carry on. This is pretty straight-forward, as you have a linear sequence of streams, so you just step through them one by one. i.e. something like:

    int currentStreamIndex = 0;
    Stream currentStream = childStreams[currentStreamIndex++];
    
    ...
    
    public override int Read(byte[] buffer, int offset, int count)
    {
        while (count > 0)
        {
            // Read what we can from the current stream
            int numBytesRead = currentSteam.Read(buffer, offset, count);
            count -= numBytesRead;
            offset += numBytesRead;
    
            // If we haven't satisfied the read request, we have exhausted the child stream.
            // Move on to the next stream and loop around to read more data.
            if (count > 0)
            {
                // If we run out of child streams to read from, we're at the end of the HugeStream, and there is no more data to read
                if (currentStreamIndex >= numberOfChildStreams)
                    break;
    
                // Otherwise, close the current child-stream and open the next one
                currentStream.Close();
                currentStream = childStreams[currentStreamIndex++];
            }
        }
    
       // Here, you'd write the data you've just read (into buffer) to your local cache stream
    }
    

    To allow seeking backwards, you just have to introduce a new local file stream that you copy all the data into as you read (see the comment in my pseudocode above). You need to introduce a state so you know that you are reading from the cache file rather than the current child stream, and then just directly access the cache (seeking etc is easy because the cache represents the entire history of the data read from the HugeStream, so the seek offsets are identical between the HugeStream and the Cache - you simply have to redirect any Read calls to get the data out of the cache stream)

    If you read or seek back to the end of the cache stream, you need to resume reading data from the current child stream. Just go back to the logic above and continue appending data to your cache stream.

    If you wish to be able to support full random access within the HugeStream you will need to support seeking "forwards" (beyond the current end of the cache stream). If you don't know the lengths of the child streams beforehand, you have no choice but to simply keep reading data into your cache until you reach the seek offset. If you know the sizes of all the streams, then you could seek directly and more efficiently to the right place, but you will then have to devise an efficient means for storing the data you read to the cache file and recording which parts of the cache file contain valid data and which have not actually been read from the DB yet - this is a bit more advanced.

    I hope that makes sense to you and gives you a better idea of how to proceed...

    (You shouldn't need to implement much more than the Read and Seek interfaces to get this working).

    0 讨论(0)
提交回复
热议问题