Which is faster, Hash lookup or Binary search?

后端 未结 17 2361
你的背包
你的背包 2020-12-02 06:47

When given a static set of objects (static in the sense that once loaded it seldom if ever changes) into which repeated concurrent lookups are needed with optimal performanc

相关标签:
17条回答
  • 2020-12-02 07:10

    I'd say it depends mainly on the performance of the hash and compare methods. For example, when using string keys that are very long but random, a compare will always yield a very quick result, but a default hash function will process the entire string.

    But in most cases the hash map should be faster.

    0 讨论(0)
  • 2020-12-02 07:11

    Surprised nobody mentioned Cuckoo hashing, which provides guaranteed O(1) and, unlike perfect hashing, is capable of using all of the memory it allocates, where as perfect hashing can end up with guaranteed O(1) but wasting the greater portion of its allocation. The caveat? Insertion time can be very slow, especially as the number of elements increases, since all of the optimization is performed during the insertion phase.

    I believe some version of this is used in router hardware for ip lookups.

    See link text

    0 讨论(0)
  • 2020-12-02 07:11

    Dictionary/Hashtable is using more memory and takes more time to populate comparing to array. But search is done faster by Dictionary rather than Binary Search within array.

    Here are the numbers for 10 Million of Int64 items to search and populate. Plus a sample code you can run by yourself.

    Dictionary Memory: 462,836

    Array Memory: 88,376

    Populate Dictionary: 402

    Populate Array: 23

    Search Dictionary: 176

    Search Array: 680

    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    
    namespace BinaryVsDictionary
    {
        internal class Program
        {
            private const long Capacity = 10000000;
    
            private static readonly Dictionary<long, long> Dict = new Dictionary<long, long>(Int16.MaxValue);
            private static readonly long[] Arr = new long[Capacity];
    
            private static void Main(string[] args)
            {
                Stopwatch stopwatch = new Stopwatch();
    
                stopwatch.Start();
    
                for (long i = 0; i < Capacity; i++)
                {
                    Dict.Add(i, i);
                }
    
                stopwatch.Stop();
    
                Console.WriteLine("Populate Dictionary: " + stopwatch.ElapsedMilliseconds);
    
                stopwatch.Reset();
    
                stopwatch.Start();
    
                for (long i = 0; i < Capacity; i++)
                {
                    Arr[i] = i;
                }
    
                stopwatch.Stop();
    
                Console.WriteLine("Populate Array:      " + stopwatch.ElapsedMilliseconds);
    
                stopwatch.Reset();
    
                stopwatch.Start();
    
                for (long i = 0; i < Capacity; i++)
                {
                    long value = Dict[i];
    //                Console.WriteLine(value + " : " + RandomNumbers[i]);
                }
    
                stopwatch.Stop();
    
                Console.WriteLine("Search Dictionary:   " + stopwatch.ElapsedMilliseconds);
    
                stopwatch.Reset();
    
                stopwatch.Start();
    
                for (long i = 0; i < Capacity; i++)
                {
                    long value = BinarySearch(Arr, 0, Capacity, i);
    //                Console.WriteLine(value + " : " + RandomNumbers[i]);
                }
    
                stopwatch.Stop();
    
                Console.WriteLine("Search Array:        " + stopwatch.ElapsedMilliseconds);
    
                Console.ReadLine();
            }
    
            private static long BinarySearch(long[] arr, long low, long hi, long value)
            {
                while (low <= hi)
                {
                    long median = low + ((hi - low) >> 1);
    
                    if (arr[median] == value)
                    {
                        return median;
                    }
    
                    if (arr[median] < value)
                    {
                        low = median + 1;
                    }
                    else
                    {
                        hi = median - 1;
                    }
                }
    
                return ~low;
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-02 07:15

    Hashes are typically faster, although binary searches have better worst-case characteristics. A hash access is typically a calculation to get a hash value to determine which "bucket" a record will be in, and so the performance will generally depend on how evenly the records are distributed, and the method used to search the bucket. A bad hash function (leaving a few buckets with a whole lot of records) with a linear search through the buckets will result in a slow search. (On the third hand, if you're reading a disk rather than memory, the hash buckets are likely to be contiguous while the binary tree pretty much guarantees non-local access.)

    If you want generally fast, use the hash. If you really want guaranteed bounded performance, you might go with the binary tree.

    0 讨论(0)
  • 2020-12-02 07:17

    This question is more complicated than the scope of pure algorithm performance. If we remove the factors that binary search algorithm is more cache friendly, the hash lookup is faster in general sense. The best way to figured out is to build a program and disable the compiler optimization options, and we could find that the hash lookup is faster given its algorithm time efficiency is O(1) in general sense.

    But when you enable the compiler optimization, and try the same test with smaller count of samples say less than 10,000, the binary search outperformed the hash lookup by taking advantages of its cache-friendly data structure.

    0 讨论(0)
提交回复
热议问题