Reading large text files with streams in C#

后端 未结 11 1666
野的像风
野的像风 2020-11-22 08:28

I\'ve got the lovely task of working out how to handle large files being loaded into our application\'s script editor (it\'s like VBA for our internal product for quick macr

相关标签:
11条回答
  • 2020-11-22 08:59

    For binary files, the fastest way of reading them I have found is this.

     MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(file);
     MemoryMappedViewStream mms = mmf.CreateViewStream();
     using (BinaryReader b = new BinaryReader(mms))
     {
     }
    

    In my tests it's hundreds of times faster.

    0 讨论(0)
  • 2020-11-22 09:01

    Use a background worker and read only a limited number of lines. Read more only when the user scrolls.

    And try to never use ReadToEnd(). It's one of the functions that you think "why did they make it?"; it's a script kiddies' helper that goes fine with small things, but as you see, it sucks for large files...

    Those guys telling you to use StringBuilder need to read the MSDN more often:

    Performance Considerations
    The Concat and AppendFormat methods both concatenate new data to an existing String or StringBuilder object. A String object concatenation operation always creates a new object from the existing string and the new data. A StringBuilder object maintains a buffer to accommodate the concatenation of new data. New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer. The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs.
    A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated. In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input.

    That means huge allocation of memory, what becomes large use of swap files system, that simulates sections of your hard disk drive to act like the RAM memory, but a hard disk drive is very slow.

    The StringBuilder option looks fine for who use the system as a mono-user, but when you have two or more users reading large files at the same time, you have a problem.

    0 讨论(0)
  • 2020-11-22 09:01

    My file is over 13 GB:

    The bellow link contains the code that read a piece of file easily:

    Read a large text file

    More information

    0 讨论(0)
  • 2020-11-22 09:03

    If you read the performance and benchmark stats on this website, you'll see that the fastest way to read (because reading, writing, and processing are all different) a text file is the following snippet of code:

    using (StreamReader sr = File.OpenText(fileName))
    {
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
            //do your stuff here
        }
    }
    

    All up about 9 different methods were bench marked, but that one seem to come out ahead the majority of the time, even out performing the buffered reader as other readers have mentioned.

    0 讨论(0)
  • 2020-11-22 09:13

    An iterator might be perfect for this type of work:

    public static IEnumerable<int> LoadFileWithProgress(string filename, StringBuilder stringData)
    {
        const int charBufferSize = 4096;
        using (FileStream fs = File.OpenRead(filename))
        {
            using (BinaryReader br = new BinaryReader(fs))
            {
                long length = fs.Length;
                int numberOfChunks = Convert.ToInt32((length / charBufferSize)) + 1;
                double iter = 100 / Convert.ToDouble(numberOfChunks);
                double currentIter = 0;
                yield return Convert.ToInt32(currentIter);
                while (true)
                {
                    char[] buffer = br.ReadChars(charBufferSize);
                    if (buffer.Length == 0) break;
                    stringData.Append(buffer);
                    currentIter += iter;
                    yield return Convert.ToInt32(currentIter);
                }
            }
        }
    }
    

    You can call it using the following:

    string filename = "C:\\myfile.txt";
    StringBuilder sb = new StringBuilder();
    foreach (int progress in LoadFileWithProgress(filename, sb))
    {
        // Update your progress counter here!
    }
    string fileData = sb.ToString();
    

    As the file is loaded, the iterator will return the progress number from 0 to 100, which you can use to update your progress bar. Once the loop has finished, the StringBuilder will contain the contents of the text file.

    Also, because you want text, we can just use BinaryReader to read in characters, which will ensure that your buffers line up correctly when reading any multi-byte characters (UTF-8, UTF-16, etc.).

    This is all done without using background tasks, threads, or complex custom state machines.

    0 讨论(0)
  • 2020-11-22 09:14

    You might be better off to use memory-mapped files handling here.. The memory mapped file support will be around in .NET 4 (I think...I heard that through someone else talking about it), hence this wrapper which uses p/invokes to do the same job..

    Edit: See here on the MSDN for how it works, here's the blog entry indicating how it is done in the upcoming .NET 4 when it comes out as release. The link I have given earlier on is a wrapper around the pinvoke to achieve this. You can map the entire file into memory, and view it like a sliding window when scrolling through the file.

    0 讨论(0)
提交回复
热议问题