I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.
For instance, if my f
Where C:/data.txt
contains 454,863 rows of seven random alphabetic characters, the following code
using System;
using System.IO;
using System.Collections;
using System.Diagnostics;
namespace ConsoleApplication {
class Program {
static void Main(string[] args) {
FileInfo fileInfo = new FileInfo(@"C:/data.txt");
Console.WriteLine(fileInfo.Length);
Stopwatch sw = new Stopwatch();
sw.Start();
Hashtable table = new Hashtable();
StreamReader sr = new StreamReader(@"C:/data.txt");
while (!sr.EndOfStream) {
char c = Char.ToLower((char)sr.Read());
if (!table.Contains(c)) {
table.Add(c, null);
}
}
sr.Close();
foreach (char c in table.Keys) {
Console.Write(c);
}
Console.WriteLine();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
}
}
produces output
4093767
mytojevqlgbxsnidhzupkfawr
c
889
Press any key to continue . . .
The first line of output tells you the number of bytes in C:/data.txt
(454,863 * (7 + 2) = 4,093,767 bytes). The next two lines of output are the unique characters in C:/data.txt
(including a newline). The last line of output tells you the number of milliseconds the code took to execute on a 2.80 GHz Pentium 4.