I have a text file that contains about 100000 articles. The structure of file is:
.Document ID 42944-YEAR:5 .Date 03\\08\\11 .Cat political Article Content 1 .
Something like this:
using (var fileStream = File.OpenText(@"path to file"))
{
do
{
var fileLine = fileStream.ReadLine();
// process fileLine here
} while (!fileStream.EndOfStream);
}
You can open the file and read it as a stream rather than loading everything into memory all at once.
From MSDN:
using System;
using System.IO;
class Test
{
public static void Main()
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
}
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.
MSDN Documentation - File.ReadLines Method (String)
Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0
Your file is too large to be read into memory in one go, as File.ReadAllText
is trying to do. You should instead read the file line by line.
Adapted from MSDN:
string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(@"c:\yourfile.txt"))
{
while ((line = file.ReadLine()) != null)
{
Console.WriteLine(line);
// do your processing on each line here
}
}
In this way, no more than a single line of the file is in memory at any one time.