I\'ve got the lovely task of working out how to handle large files being loaded into our application\'s script editor (it\'s like VBA for our internal product for quick macr
You can improve read speed by using a BufferedStream, like this:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string line;
while ((line = sr.ReadLine()) != null)
{
}
}
March 2013 UPDATE
I recently wrote code for reading and processing (searching for text in) 1 GB-ish text files (much larger than the files involved here) and achieved a significant performance gain by using a producer/consumer pattern. The producer task read in lines of text using the BufferedStream
and handed them off to a separate consumer task that did the searching.
I used this as an opportunity to learn TPL Dataflow, which is very well suited for quickly coding this pattern.
Why BufferedStream is faster
A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance. A buffer can be used for either reading or writing, but never both simultaneously. The Read and Write methods of BufferedStream automatically maintain the buffer.
December 2014 UPDATE: Your Mileage May Vary
Based on the comments, FileStream should be using a BufferedStream internally. At the time this answer was first provided, I measured a significant performance boost by adding a BufferedStream. At the time I was targeting .NET 3.x on a 32-bit platform. Today, targeting .NET 4.5 on a 64-bit platform, I do not see any improvement.
Related
I came across a case where streaming a large, generated CSV file to the Response stream from an ASP.Net MVC action was very slow. Adding a BufferedStream improved performance by 100x in this instance. For more see Unbuffered Output Very Slow
This should be enough to get you started.
class Program
{
static void Main(String[] args)
{
const int bufferSize = 1024;
var sb = new StringBuilder();
var buffer = new Char[bufferSize];
var length = 0L;
var totalRead = 0L;
var count = bufferSize;
using (var sr = new StreamReader(@"C:\Temp\file.txt"))
{
length = sr.BaseStream.Length;
while (count > 0)
{
count = sr.Read(buffer, 0, bufferSize);
sb.Append(buffer, 0, count);
totalRead += count;
}
}
Console.ReadKey();
}
}
Have a look at the following code snippet. You have mentioned Most files will be 30-40 MB
. This claims to read 180 MB in 1.4 seconds on an Intel Quad Core:
private int _bufferSize = 16384;
private void ReadFile(string filename)
{
StringBuilder stringBuilder = new StringBuilder();
FileStream fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read);
using (StreamReader streamReader = new StreamReader(fileStream))
{
char[] fileContents = new char[_bufferSize];
int charsRead = streamReader.Read(fileContents, 0, _bufferSize);
// Can't do much with 0 bytes
if (charsRead == 0)
throw new Exception("File is 0 bytes");
while (charsRead > 0)
{
stringBuilder.Append(fileContents);
charsRead = streamReader.Read(fileContents, 0, _bufferSize);
}
}
}
Original Article
All excellent answers! however, for someone looking for an answer, these appear to be somewhat incomplete.
As a standard String can only of Size X, 2Gb to 4Gb depending on your configuration, these answers do not really fulfil the OP's question. One method is to work with a List of Strings:
List<string> Words = new List<string>();
using (StreamReader sr = new StreamReader(@"C:\Temp\file.txt"))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
Words.Add(line);
}
}
Some may want to Tokenise and split the line when processing. The String List now can contain very large volumes of Text.
You say you have been asked to show a progress bar while a large file is loading. Is that because the users genuinely want to see the exact % of file loading, or just because they want visual feedback that something is happening?
If the latter is true, then the solution becomes much simpler. Just do reader.ReadToEnd()
on a background thread, and display a marquee-type progress bar instead of a proper one.
I raise this point because in my experience this is often the case. When you are writing a data processing program, then users will definitely be interested in a % complete figure, but for simple-but-slow UI updates, they are more likely to just want to know that the computer hasn't crashed. :-)