Fastest way to implement Duplicate Character Removal in String (C#)

前端 未结 5 675
说谎
说谎 2020-12-14 12:43

In C#, what is the fastest way to detect duplicate characters in a String and remove them (removal including 1st instance of the duplicated character)?

相关标签:
5条回答
  • 2020-12-14 12:53

    This preserves order and, based on my tests, is 4 times faster than using a HashSet. This assumes your character range is 0-255 but you can extend that easily. If you're planning on using this in a loop, move the int[] c = new int[255]; out and in the function do an Array.Clear(c,0,255).

    
            private static string RemoveDuplicates(string s)
            {
                int[] c = new int[255];
                for (int i = 0; i < s.Length; i++)
                {
                    c[s[i]]++;
                }
                StringBuilder sb = new StringBuilder();
                for (int i = 0; i < s.Length; i++)
                {
                    if (c[s[i]] == 1) sb.Append(s[i]);
                }
                return sb.ToString();
            }
    
    0 讨论(0)
  • 2020-12-14 12:55

    Here is a pretty fast one preserving order. But I'd be a little worried about how LINQ does Group and Where:

    string s = "nbHHkRvrXbvkn";
    Console.WriteLine( 
        s.ToCharArray()
            .GroupBy(c => c)
            .Where(g => g.Count() == 1)
            .Aggregate(new StringBuilder(), (b, g) => b.Append(g.Key)));
    

    Edit: This one beats Luke's in some cases still slower than dtb's, but it preserves the order

    private static string MyMethod(string s)
    {
        StringBuilder sb = new StringBuilder(s.Length);
        foreach (var g in s.ToCharArray().GroupBy(c => c))
            if (g.Count() == 1) sb.Append(g.Key);
    
        return sb.ToString();
    }
    
    0 讨论(0)
  • 2020-12-14 13:01

    Fastest as in fewest-lines-of-code:

    var s = "nbHHkRvrXbvkn";
    var duplicates = s.Where(ch => s.Count(c => c == ch) > 1);
    var result = new string(s.Except(duplicates).ToArray()); // = "RrX"
    

    Fastest as in fastest-performance would probably be something like this (does not preserve order):

    var h1 = new HashSet<char>();
    var h2 = new HashSet<char>();
    
    foreach (var ch in "nbHHkRvrXbvkn")
    {
        if (!h1.Add(ch))
        {
            h2.Add(ch);
        }
    }
    
    h1.ExceptWith(h2); // remove duplicates
    
    var chars = new char[h1.Count];
    h1.CopyTo(chars);
    var result = new string(chars); // = "RrX"
    

    Performance test

    When in doubt -- test it :)

    Yuriy Faktorovich's answer        00:00:00.2360900
    Luke's answer                     00:00:00.2225683
    My 'few lines' answer             00:00:00.5318395
    My 'fast' answer                  00:00:00.1842144
    
    0 讨论(0)
  • 2020-12-14 13:05

    This one should be pretty quick (and it preserves the original order):

    public static string RemoveDuplicates(string source)
    {
        HashSet<char> found = new HashSet<char>();
        HashSet<char> dupes = new HashSet<char>();
    
        foreach (char c in source)
        {
            if (!found.Add(c))
            {
                dupes.Add(c);
            }
        }
    
        StringBuilder sb = new StringBuilder(source.Length);
        foreach (char c in source)
        {
            if (!dupes.Contains(c))
            {
                sb.Append(c);
            }
        }
        return sb.ToString();
    }
    
    0 讨论(0)
  • 2020-12-14 13:19

    This algorithm is general, can be applied to any language

    1. create a map(HashTable) char->int that holds the count of each character found, initially empty
    2. scan the string once to populate the map.
    3. create a new empty string that'll hold the output, may be you'll need to use a StringBuilder.
    4. scan the string(orthe map, whichever is shorter) copying only characters which an occurrence of 1 to the output string/StringBuilder
    0 讨论(0)
提交回复
热议问题