HashSet vs. List performance

前端 未结 12 2300
小鲜肉
小鲜肉 2020-11-22 09:19

It\'s clear that a search performance of the generic HashSet class is higher than of the generic List class. Just compare the has

相关标签:
12条回答
  • 2020-11-22 09:41

    It's essentially pointless to compare two structures for performance that behave differently. Use the structure that conveys the intent. Even if you say your List<T> wouldn't have duplicates and iteration order doesn't matter making it comparable to a HashSet<T>, its still a poor choice to use List<T> because its relatively less fault tolerant.

    That said, I will inspect some other aspects of performance,

    +------------+--------+-------------+-----------+----------+----------+-----------+
    | Collection | Random | Containment | Insertion | Addition |  Removal | Memory    |
    |            | access |             |           |          |          |           |
    +------------+--------+-------------+-----------+----------+----------+-----------+
    | List<T>    | O(1)   | O(n)        | O(n)      | O(1)*    | O(n)     | Lesser    |
    | HashSet<T> | O(n)   | O(1)        | n/a       | O(1)     | O(1)     | Greater** |
    +------------+--------+-------------+-----------+----------+----------+-----------+
    
    • Even though addition is O(1) in both cases, it will be relatively slower in HashSet since it involves cost of precomputing hash code before storing it.

    • The superior scalability of HashSet has a memory cost. Every entry is stored as a new object along with its hash code. This article might give you an idea.

    0 讨论(0)
  • 2020-11-22 09:42

    A lot of people are saying that once you get to the size where speed is actually a concern that HashSet<T> will always beat List<T>, but that depends on what you are doing.

    Let's say you have a List<T> that will only ever have on average 5 items in it. Over a large number of cycles, if a single item is added or removed each cycle, you may well be better off using a List<T>.

    I did a test for this on my machine, and, well, it has to be very very small to get an advantage from List<T>. For a list of short strings, the advantage went away after size 5, for objects after size 20.

    1 item LIST strs time: 617ms
    1 item HASHSET strs time: 1332ms
    
    2 item LIST strs time: 781ms
    2 item HASHSET strs time: 1354ms
    
    3 item LIST strs time: 950ms
    3 item HASHSET strs time: 1405ms
    
    4 item LIST strs time: 1126ms
    4 item HASHSET strs time: 1441ms
    
    5 item LIST strs time: 1370ms
    5 item HASHSET strs time: 1452ms
    
    6 item LIST strs time: 1481ms
    6 item HASHSET strs time: 1418ms
    
    7 item LIST strs time: 1581ms
    7 item HASHSET strs time: 1464ms
    
    8 item LIST strs time: 1726ms
    8 item HASHSET strs time: 1398ms
    
    9 item LIST strs time: 1901ms
    9 item HASHSET strs time: 1433ms
    
    1 item LIST objs time: 614ms
    1 item HASHSET objs time: 1993ms
    
    4 item LIST objs time: 837ms
    4 item HASHSET objs time: 1914ms
    
    7 item LIST objs time: 1070ms
    7 item HASHSET objs time: 1900ms
    
    10 item LIST objs time: 1267ms
    10 item HASHSET objs time: 1904ms
    
    13 item LIST objs time: 1494ms
    13 item HASHSET objs time: 1893ms
    
    16 item LIST objs time: 1695ms
    16 item HASHSET objs time: 1879ms
    
    19 item LIST objs time: 1902ms
    19 item HASHSET objs time: 1950ms
    
    22 item LIST objs time: 2136ms
    22 item HASHSET objs time: 1893ms
    
    25 item LIST objs time: 2357ms
    25 item HASHSET objs time: 1826ms
    
    28 item LIST objs time: 2555ms
    28 item HASHSET objs time: 1865ms
    
    31 item LIST objs time: 2755ms
    31 item HASHSET objs time: 1963ms
    
    34 item LIST objs time: 3025ms
    34 item HASHSET objs time: 1874ms
    
    37 item LIST objs time: 3195ms
    37 item HASHSET objs time: 1958ms
    
    40 item LIST objs time: 3401ms
    40 item HASHSET objs time: 1855ms
    
    43 item LIST objs time: 3618ms
    43 item HASHSET objs time: 1869ms
    
    46 item LIST objs time: 3883ms
    46 item HASHSET objs time: 2046ms
    
    49 item LIST objs time: 4218ms
    49 item HASHSET objs time: 1873ms
    

    Here is that data displayed as a graph:

    enter image description here

    Here's the code:

    static void Main(string[] args)
    {
        int times = 10000000;
    
    
        for (int listSize = 1; listSize < 10; listSize++)
        {
            List<string> list = new List<string>();
            HashSet<string> hashset = new HashSet<string>();
    
            for (int i = 0; i < listSize; i++)
            {
                list.Add("string" + i.ToString());
                hashset.Add("string" + i.ToString());
            }
    
            Stopwatch timer = new Stopwatch();
            timer.Start();
            for (int i = 0; i < times; i++)
            {
                list.Remove("string0");
                list.Add("string0");
            }
            timer.Stop();
            Console.WriteLine(listSize.ToString() + " item LIST strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
    
    
            timer = new Stopwatch();
            timer.Start();
            for (int i = 0; i < times; i++)
            {
                hashset.Remove("string0");
                hashset.Add("string0");
            }
            timer.Stop();
            Console.WriteLine(listSize.ToString() + " item HASHSET strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
            Console.WriteLine();
        }
    
    
        for (int listSize = 1; listSize < 50; listSize+=3)
        {
            List<object> list = new List<object>();
            HashSet<object> hashset = new HashSet<object>();
    
            for (int i = 0; i < listSize; i++)
            {
                list.Add(new object());
                hashset.Add(new object());
            }
    
            object objToAddRem = list[0];
    
            Stopwatch timer = new Stopwatch();
            timer.Start();
            for (int i = 0; i < times; i++)
            {
                list.Remove(objToAddRem);
                list.Add(objToAddRem);
            }
            timer.Stop();
            Console.WriteLine(listSize.ToString() + " item LIST objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
    
    
    
            timer = new Stopwatch();
            timer.Start();
            for (int i = 0; i < times; i++)
            {
                hashset.Remove(objToAddRem);
                hashset.Add(objToAddRem);
            }
            timer.Stop();
            Console.WriteLine(listSize.ToString() + " item HASHSET objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
            Console.WriteLine();
        }
    
        Console.ReadLine();
    }
    
    0 讨论(0)
  • 2020-11-22 09:42

    You're looking at this wrong. Yes a linear search of a List will beat a HashSet for a small number of items. But the performance difference usually doesn't matter for collections that small. It's generally the large collections you have to worry about, and that's where you think in terms of Big-O. However, if you've measured a real bottleneck on HashSet performance, then you can try to create a hybrid List/HashSet, but you'll do that by conducting lots of empirical performance tests - not asking questions on SO.

    0 讨论(0)
  • 2020-11-22 09:42

    Depends on what you're hashing. If your keys are integers you probably don't need very many items before the HashSet is faster. If you're keying it on a string then it will be slower, and depends on the input string.

    Surely you could whip up a benchmark pretty easily?

    0 讨论(0)
  • 2020-11-22 09:43

    Just thought I'd chime in with some benchmarks for different scenarios to illustrate the previous answers:

    1. A few (12 - 20) small strings (length between 5 and 10 characters)
    2. Many (~10K) small strings
    3. A few long strings (length between 200 and 1000 characters)
    4. Many (~5K) long strings
    5. A few integers
    6. Many (~10K) integers

    And for each scenario, looking up values which appear:

    1. In the beginning of the list ("start", index 0)
    2. Near the beginning of the list ("early", index 1)
    3. In the middle of the list ("middle", index count/2)
    4. Near the end of the list ("late", index count-2)
    5. At the end of the list ("end", index count-1)

    Before each scenario I generated randomly sized lists of random strings, and then fed each list to a hashset. Each scenario ran 10,000 times, essentially:

    (test pseudocode)

    stopwatch.start
    for X times
        exists = list.Contains(lookup);
    stopwatch.stop
    
    stopwatch.start
    for X times
        exists = hashset.Contains(lookup);
    stopwatch.stop
    

    Sample Output

    Tested on Windows 7, 12GB Ram, 64 bit, Xeon 2.8GHz

    ---------- Testing few small strings ------------
    Sample items: (16 total)
    vgnwaloqf diwfpxbv tdcdc grfch icsjwk
    ...
    
    Benchmarks:
    1: hashset: late -- 100.00 % -- [Elapsed: 0.0018398 sec]
    2: hashset: middle -- 104.19 % -- [Elapsed: 0.0019169 sec]
    3: hashset: end -- 108.21 % -- [Elapsed: 0.0019908 sec]
    4: list: early -- 144.62 % -- [Elapsed: 0.0026607 sec]
    5: hashset: start -- 174.32 % -- [Elapsed: 0.0032071 sec]
    6: list: middle -- 187.72 % -- [Elapsed: 0.0034536 sec]
    7: list: late -- 192.66 % -- [Elapsed: 0.0035446 sec]
    8: list: end -- 215.42 % -- [Elapsed: 0.0039633 sec]
    9: hashset: early -- 217.95 % -- [Elapsed: 0.0040098 sec]
    10: list: start -- 576.55 % -- [Elapsed: 0.0106073 sec]
    
    
    ---------- Testing many small strings ------------
    Sample items: (10346 total)
    dmnowa yshtrxorj vthjk okrxegip vwpoltck
    ...
    
    Benchmarks:
    1: hashset: end -- 100.00 % -- [Elapsed: 0.0017443 sec]
    2: hashset: late -- 102.91 % -- [Elapsed: 0.0017951 sec]
    3: hashset: middle -- 106.23 % -- [Elapsed: 0.0018529 sec]
    4: list: early -- 107.49 % -- [Elapsed: 0.0018749 sec]
    5: list: start -- 126.23 % -- [Elapsed: 0.0022018 sec]
    6: hashset: early -- 134.11 % -- [Elapsed: 0.0023393 sec]
    7: hashset: start -- 372.09 % -- [Elapsed: 0.0064903 sec]
    8: list: middle -- 48,593.79 % -- [Elapsed: 0.8476214 sec]
    9: list: end -- 99,020.73 % -- [Elapsed: 1.7272186 sec]
    10: list: late -- 99,089.36 % -- [Elapsed: 1.7284155 sec]
    
    
    ---------- Testing few long strings ------------
    Sample items: (19 total)
    hidfymjyjtffcjmlcaoivbylakmqgoiowbgxpyhnrreodxyleehkhsofjqenyrrtlphbcnvdrbqdvji...
    ...
    
    Benchmarks:
    1: list: early -- 100.00 % -- [Elapsed: 0.0018266 sec]
    2: list: start -- 115.76 % -- [Elapsed: 0.0021144 sec]
    3: list: middle -- 143.44 % -- [Elapsed: 0.0026201 sec]
    4: list: late -- 190.05 % -- [Elapsed: 0.0034715 sec]
    5: list: end -- 193.78 % -- [Elapsed: 0.0035395 sec]
    6: hashset: early -- 215.00 % -- [Elapsed: 0.0039271 sec]
    7: hashset: end -- 248.47 % -- [Elapsed: 0.0045386 sec]
    8: hashset: start -- 298.04 % -- [Elapsed: 0.005444 sec]
    9: hashset: middle -- 325.63 % -- [Elapsed: 0.005948 sec]
    10: hashset: late -- 431.62 % -- [Elapsed: 0.0078839 sec]
    
    
    ---------- Testing many long strings ------------
    Sample items: (5000 total)
    yrpjccgxjbketcpmnvyqvghhlnjblhgimybdygumtijtrwaromwrajlsjhxoselbucqualmhbmwnvnpnm
    ...
    
    Benchmarks:
    1: list: early -- 100.00 % -- [Elapsed: 0.0016211 sec]
    2: list: start -- 132.73 % -- [Elapsed: 0.0021517 sec]
    3: hashset: start -- 231.26 % -- [Elapsed: 0.003749 sec]
    4: hashset: end -- 368.74 % -- [Elapsed: 0.0059776 sec]
    5: hashset: middle -- 385.50 % -- [Elapsed: 0.0062493 sec]
    6: hashset: late -- 406.23 % -- [Elapsed: 0.0065854 sec]
    7: hashset: early -- 421.34 % -- [Elapsed: 0.0068304 sec]
    8: list: middle -- 18,619.12 % -- [Elapsed: 0.3018345 sec]
    9: list: end -- 40,942.82 % -- [Elapsed: 0.663724 sec]
    10: list: late -- 41,188.19 % -- [Elapsed: 0.6677017 sec]
    
    
    ---------- Testing few ints ------------
    Sample items: (16 total)
    7266092 60668895 159021363 216428460 28007724
    ...
    
    Benchmarks:
    1: hashset: early -- 100.00 % -- [Elapsed: 0.0016211 sec]
    2: hashset: end -- 100.45 % -- [Elapsed: 0.0016284 sec]
    3: list: early -- 101.83 % -- [Elapsed: 0.0016507 sec]
    4: hashset: late -- 108.95 % -- [Elapsed: 0.0017662 sec]
    5: hashset: middle -- 112.29 % -- [Elapsed: 0.0018204 sec]
    6: hashset: start -- 120.33 % -- [Elapsed: 0.0019506 sec]
    7: list: late -- 134.45 % -- [Elapsed: 0.0021795 sec]
    8: list: start -- 136.43 % -- [Elapsed: 0.0022117 sec]
    9: list: end -- 169.77 % -- [Elapsed: 0.0027522 sec]
    10: list: middle -- 237.94 % -- [Elapsed: 0.0038573 sec]
    
    
    ---------- Testing many ints ------------
    Sample items: (10357 total)
    370826556 569127161 101235820 792075135 270823009
    ...
    
    Benchmarks:
    1: list: early -- 100.00 % -- [Elapsed: 0.0015132 sec]
    2: hashset: end -- 101.79 % -- [Elapsed: 0.0015403 sec]
    3: hashset: early -- 102.08 % -- [Elapsed: 0.0015446 sec]
    4: hashset: middle -- 103.21 % -- [Elapsed: 0.0015618 sec]
    5: hashset: late -- 104.26 % -- [Elapsed: 0.0015776 sec]
    6: list: start -- 126.78 % -- [Elapsed: 0.0019184 sec]
    7: hashset: start -- 130.91 % -- [Elapsed: 0.0019809 sec]
    8: list: middle -- 16,497.89 % -- [Elapsed: 0.2496461 sec]
    9: list: end -- 32,715.52 % -- [Elapsed: 0.4950512 sec]
    10: list: late -- 33,698.87 % -- [Elapsed: 0.5099313 sec]
    
    0 讨论(0)
  • 2020-11-22 09:48

    One factor your not taking into account is the robustness of the GetHashcode() function. With a perfect hash function the HashSet will clearly have better searching performance. But as the hash function diminishes so will the HashSet search time.

    0 讨论(0)
提交回复
热议问题