comparing the contents of two huge text files quickly

前端 未结 3 1701
走了就别回头了
走了就别回头了 2021-02-03 16:44

what i\'m basically trying to do is compare two HUGE text files and if they match write out a string, i have this written but it\'s extremely slow. I was hoping you guys might

3条回答
  •  迷失自我
    2021-02-03 16:51

    • Call File.ReadLines() (.NET 4) instead of ReadAllLines() (.NET 2.0).
      ReadAllLines needs to build an array to hold the return value, which can be extremely slow for large files.
      If you're not using .Net 4.0, replace it with a StreamReader.

    • Build a Dictionary with the matchCollects (once), then loop through the foundList and check whether the HashSet contains matchFound.
      This allows you to replace the O(n) inner loop with an O(1) hash check

    • Use a StreamWriter instead of calling AppendText

    • EDIT: Call Path.GetFileNameWithoutExtension and the other Path methods instead of manually manipulating strings.

    For example:

    var collection = File.ReadLines(@"C:\found.txt")
        .ToDictionary(s => s.Split('\\')[3].Replace(".txt", ""));
    
    using (var writer = new StreamWriter(@"C:\Copy.txt")) {
        foreach (string found in foundlist) {
            string splitFound = found.Split('|');
            string matchFound = Path.GetFileNameWithoutExtension(found)
    
            string collectedLine;
            if (collection.TryGetValue(matchFound, collectedLine)) {
                end++;
                long finaldest = (start - end);
                Console.WriteLine(finaldest);
                writer.WriteLine("copy \"" + collectedLine + "\" \"C:\\OUT\\" 
                               + splitFound[1] + "\\" + spltifound[0] + ".txt\"");
            }
        }
    }
    

提交回复
热议问题