Language: vb.net File size: 1GB, and stuff.
Encoding of the text file: UTF8 (so each character is represented by different numbers of b
Depending on how long the lines are, you may be able to compute an MD5 hash value for each line and store than in a HashMap
:
Using sr As New StreamReader("myFile")
Dim lines As New HashSet(Of String)
Dim md5 As New Security.Cryptography.MD5Cng()
While sr.BaseStream.Position < sr.BaseStream.Length
Dim l As String = sr.ReadLine()
Dim hash As String = String.Join(String.Empty, md5.ComputeHash(System.Text.Encoding.UTF8.GetBytes(l)).Select(Function(x) x.ToString("x2")))
If lines.Contains(hash) Then
'Lines are not unique
Exit While
Else
lines.Add(hash)
End If
End While
End Using
Untested, but this may be fast enough for your needs. I can't think of something much faster that still maintains some semblance of conciseness :)