How to ensure that a file has unique line in vb.net if the size of the file is very big

前端 未结 2 731
面向向阳花
面向向阳花 2021-01-24 18:48

Language: vb.net File size: 1GB, and stuff.

Encoding of the text file: UTF8 (so each character is represented by different numbers of b

2条回答
  •  感情败类
    2021-01-24 19:30

    Depending on how long the lines are, you may be able to compute an MD5 hash value for each line and store than in a HashMap:

    Using sr As New StreamReader("myFile")
        Dim lines As New HashSet(Of String)
        Dim md5 As New Security.Cryptography.MD5Cng()
    
        While sr.BaseStream.Position < sr.BaseStream.Length
            Dim l As String = sr.ReadLine()
            Dim hash As String = String.Join(String.Empty, md5.ComputeHash(System.Text.Encoding.UTF8.GetBytes(l)).Select(Function(x) x.ToString("x2")))
    
            If lines.Contains(hash) Then
                'Lines are not unique
                Exit While
            Else
                lines.Add(hash)
            End If
        End While
    End Using
    

    Untested, but this may be fast enough for your needs. I can't think of something much faster that still maintains some semblance of conciseness :)

提交回复
热议问题