Reading large text files with streams in C#

后端 未结 11 1658
野的像风
野的像风 2020-11-22 08:28

I\'ve got the lovely task of working out how to handle large files being loaded into our application\'s script editor (it\'s like VBA for our internal product for quick macr

相关标签:
11条回答
  • 2020-11-22 09:16

    You can improve read speed by using a BufferedStream, like this:

    using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
    using (BufferedStream bs = new BufferedStream(fs))
    using (StreamReader sr = new StreamReader(bs))
    {
        string line;
        while ((line = sr.ReadLine()) != null)
        {
    
        }
    }
    

    March 2013 UPDATE

    I recently wrote code for reading and processing (searching for text in) 1 GB-ish text files (much larger than the files involved here) and achieved a significant performance gain by using a producer/consumer pattern. The producer task read in lines of text using the BufferedStream and handed them off to a separate consumer task that did the searching.

    I used this as an opportunity to learn TPL Dataflow, which is very well suited for quickly coding this pattern.

    Why BufferedStream is faster

    A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance. A buffer can be used for either reading or writing, but never both simultaneously. The Read and Write methods of BufferedStream automatically maintain the buffer.

    December 2014 UPDATE: Your Mileage May Vary

    Based on the comments, FileStream should be using a BufferedStream internally. At the time this answer was first provided, I measured a significant performance boost by adding a BufferedStream. At the time I was targeting .NET 3.x on a 32-bit platform. Today, targeting .NET 4.5 on a 64-bit platform, I do not see any improvement.

    Related

    I came across a case where streaming a large, generated CSV file to the Response stream from an ASP.Net MVC action was very slow. Adding a BufferedStream improved performance by 100x in this instance. For more see Unbuffered Output Very Slow

    0 讨论(0)
  • 2020-11-22 09:20

    This should be enough to get you started.

    class Program
    {        
        static void Main(String[] args)
        {
            const int bufferSize = 1024;
    
            var sb = new StringBuilder();
            var buffer = new Char[bufferSize];
            var length = 0L;
            var totalRead = 0L;
            var count = bufferSize; 
    
            using (var sr = new StreamReader(@"C:\Temp\file.txt"))
            {
                length = sr.BaseStream.Length;               
                while (count > 0)
                {                    
                    count = sr.Read(buffer, 0, bufferSize);
                    sb.Append(buffer, 0, count);
                    totalRead += count;
                }                
            }
    
            Console.ReadKey();
        }
    }
    
    0 讨论(0)
  • 2020-11-22 09:20

    Have a look at the following code snippet. You have mentioned Most files will be 30-40 MB. This claims to read 180 MB in 1.4 seconds on an Intel Quad Core:

    private int _bufferSize = 16384;
    
    private void ReadFile(string filename)
    {
        StringBuilder stringBuilder = new StringBuilder();
        FileStream fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read);
    
        using (StreamReader streamReader = new StreamReader(fileStream))
        {
            char[] fileContents = new char[_bufferSize];
            int charsRead = streamReader.Read(fileContents, 0, _bufferSize);
    
            // Can't do much with 0 bytes
            if (charsRead == 0)
                throw new Exception("File is 0 bytes");
    
            while (charsRead > 0)
            {
                stringBuilder.Append(fileContents);
                charsRead = streamReader.Read(fileContents, 0, _bufferSize);
            }
        }
    }
    

    Original Article

    0 讨论(0)
  • 2020-11-22 09:23

    All excellent answers! however, for someone looking for an answer, these appear to be somewhat incomplete.

    As a standard String can only of Size X, 2Gb to 4Gb depending on your configuration, these answers do not really fulfil the OP's question. One method is to work with a List of Strings:

    List<string> Words = new List<string>();
    
    using (StreamReader sr = new StreamReader(@"C:\Temp\file.txt"))
    {
    
    string line = string.Empty;
    
    while ((line = sr.ReadLine()) != null)
    {
        Words.Add(line);
    }
    }
    

    Some may want to Tokenise and split the line when processing. The String List now can contain very large volumes of Text.

    0 讨论(0)
  • 2020-11-22 09:26

    You say you have been asked to show a progress bar while a large file is loading. Is that because the users genuinely want to see the exact % of file loading, or just because they want visual feedback that something is happening?

    If the latter is true, then the solution becomes much simpler. Just do reader.ReadToEnd() on a background thread, and display a marquee-type progress bar instead of a proper one.

    I raise this point because in my experience this is often the case. When you are writing a data processing program, then users will definitely be interested in a % complete figure, but for simple-but-slow UI updates, they are more likely to just want to know that the computer hasn't crashed. :-)

    0 讨论(0)
提交回复
热议问题