Efficiently merge string arrays in .NET, keeping distinct values

前端 未结 6 1622
迷失自我
迷失自我 2021-02-02 06:38

I\'m using .NET 3.5. I have two string arrays, which may share one or more values:

string[] list1 = new string[] { \"apple\", \"orange\", \"banana\" };
string[]         


        
6条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-02 06:56

    .NET 3.5 introduced the HashSet class which could do this:

    IEnumerable mergedDistinctList = new HashSet(list1).Union(list2);
    

    Not sure of performance, but it should beat the Linq example you gave.

    EDIT: I stand corrected. The lazy implementation of Concat and Distinct have a key memory AND speed advantage. Concat/Distinct is about 10% faster, and saves multiple copies of data.

    I confirmed through code:

    Setting up arrays of 3000000 strings overlapping by 300000
    Starting Hashset...
    HashSet: 00:00:02.8237616
    Starting Concat/Distinct...
    Concat/Distinct: 00:00:02.5629681
    

    is the output of:

            int num = 3000000;
            int num10Pct = (int)(num / 10);
    
            Console.WriteLine(String.Format("Setting up arrays of {0} strings overlapping by {1}", num, num10Pct));
            string[] list1 = Enumerable.Range(1, num).Select((a) => a.ToString()).ToArray();
            string[] list2 = Enumerable.Range(num - num10Pct, num + num10Pct).Select((a) => a.ToString()).ToArray();
    
            Console.WriteLine("Starting Hashset...");
            Stopwatch sw = new Stopwatch();
            sw.Start();
            string[] merged = new HashSet(list1).Union(list2).ToArray();
            sw.Stop();
            Console.WriteLine("HashSet: " + sw.Elapsed);
    
            Console.WriteLine("Starting Concat/Distinct...");
            sw.Reset();
            sw.Start();
            string[] merged2 = list1.Concat(list2).Distinct().ToArray();
            sw.Stop();
            Console.WriteLine("Concat/Distinct: " + sw.Elapsed);
    

提交回复
热议问题