I have a text file that contains about 100000 articles. The structure of file is:
.Document ID 42944-YEAR:5 .Date 03\\08\\11 .Cat political Article Content 1 .
Your file is too large to be read into memory in one go, as File.ReadAllText
is trying to do. You should instead read the file line by line.
Adapted from MSDN:
string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(@"c:\yourfile.txt"))
{
while ((line = file.ReadLine()) != null)
{
Console.WriteLine(line);
// do your processing on each line here
}
}
In this way, no more than a single line of the file is in memory at any one time.