Remove Duplicate Lines From Text File?

后端 未结 5 2091
孤街浪徒
孤街浪徒 2020-12-09 05:58

Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.

5条回答
  •  醉梦人生
    2020-12-09 06:48

    This should do (and will copy with large files).

    Note that it only removes duplicate consecutive lines, i.e.

    a
    b
    b
    c
    b
    d
    

    will end up as

    a
    b
    c
    b
    d
    

    If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.

    using System;
    using System.IO;
    
    class DeDuper
    {
        static void Main(string[] args)
        {
            if (args.Length != 2)
            {
                Console.WriteLine("Usage: DeDuper  ");
                return;
            }
            using (TextReader reader = File.OpenText(args[0]))
            using (TextWriter writer = File.CreateText(args[1]))
            {
                string currentLine;
                string lastLine = null;
    
                while ((currentLine = reader.ReadLine()) != null)
                {
                    if (currentLine != lastLine)
                    {
                        writer.WriteLine(currentLine);
                        lastLine = currentLine;
                    }
                }
            }
        }
    }
    

    Note that this assumes Encoding.UTF8, and that you want to use files. It's easy to generalize as a method though:

    static void CopyLinesRemovingConsecutiveDupes
        (TextReader reader, TextWriter writer)
    {
        string currentLine;
        string lastLine = null;
    
        while ((currentLine = reader.ReadLine()) != null)
        {
            if (currentLine != lastLine)
            {
                writer.WriteLine(currentLine);
                lastLine = currentLine;
            }
        }
    }
    

    (Note that that doesn't close anything - the caller should do that.)

    Here's a version that will remove all duplicates, rather than just consecutive ones:

    static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
    {
        string currentLine;
        HashSet previousLines = new HashSet();
    
        while ((currentLine = reader.ReadLine()) != null)
        {
            // Add returns true if it was actually added,
            // false if it was already there
            if (previousLines.Add(currentLine))
            {
                writer.WriteLine(currentLine);
            }
        }
    }
    

提交回复
热议问题