Which is faster, Hash lookup or Binary search?

后端 未结 17 2359
你的背包
你的背包 2020-12-02 06:47

When given a static set of objects (static in the sense that once loaded it seldom if ever changes) into which repeated concurrent lookups are needed with optimal performanc

相关标签:
17条回答
  • 2020-12-02 06:52

    If your set of objects is truly static and unchanging, you can use a perfect hash to get O(1) performance guaranteed. I've seen gperf mentioned a few times, though I've never had occasion to use it myself.

    0 讨论(0)
  • 2020-12-02 06:52

    I wonder why no one mentioned perfect hashing.

    It's only relevant if your dataset is fixed for a long time, but what it does it analyze the data and construct a perfect hash function that ensures no collisions.

    Pretty neat, if your data set is constant and the time to calculate the function is small compared to the application run time.

    0 讨论(0)
  • 2020-12-02 06:53

    Here it's described how hashes are built and because the Universe of keys is reasonably big and hash functions are built to be "very injective" so that collisions rarely happen the access time for a hash table is not O(1) actually ... it's something based on some probabilities. But,it is reasonable to say that the access time of a hash is almost always less than the time O(log_2(n))

    0 讨论(0)
  • 2020-12-02 06:54

    Ok, I'll try to be short.

    C# short answer:

    Test the two different approaches.

    .NET gives you the tools to change your approach with a line of code. Otherwise use System.Collections.Generic.Dictionary and be sure to initialize it with a large number as initial capacity or you'll pass the rest of your life inserting items due to the job GC has to do to collect old bucket arrays.

    Longer answer:

    An hashtable has ALMOST constant lookup times and getting to an item in an hash table in the real world does not just require to compute an hash.

    To get to an item, your hashtable will do something like this:

    • Get the hash of the key
    • Get the bucket number for that hash (usually the map function looks like this bucket = hash % bucketsCount)
    • Traverse the items chain (basically it's a list of items that share the same bucket, most hashtables use this method of handling bucket/hash collisions) that starts at that bucket and compare each key with the one of the item you are trying to add/delete/update/check if contained.

    Lookup times depend on how "good" (how sparse is the output) and fast is your hash function, the number of buckets you are using and how fast is the keys comparer, it's not always the best solution.

    A better and deeper explanation: http://en.wikipedia.org/wiki/Hash_table

    0 讨论(0)
  • 2020-12-02 07:00

    I strongly suspect that in a problem set of size ~1M, hashing would be faster.

    Just for the numbers:

    a binary search would require ~ 20 compares (2^20 == 1M)

    a hash lookup would require 1 hash calculation on the search key, and possibly a handful of compares afterwards to resolve possible collisions

    Edit: the numbers:

        for (int i = 0; i < 1000 * 1000; i++) {
            c.GetHashCode();
        }
        for (int i = 0; i < 1000 * 1000; i++) {
            for (int j = 0; j < 20; j++)
                c.CompareTo(d);
        }
    

    times: c = "abcde", d = "rwerij" hashcode: 0.0012 seconds. Compare: 2.4 seconds.

    disclaimer: Actually benchmarking a hash lookup versus a binary lookup might be better than this not-entirely-relevant test. I'm not even sure if GetHashCode gets memoized under-the-hood

    0 讨论(0)
  • 2020-12-02 07:01

    The answer depends. Lets think that the number of elements 'n' is very large. If you are good at writing a better hash function which lesser collisions, then hashing is the best. Note that The hash function is being executed only once at searching and it directs to the corresponding bucket. So it is not a big overhead if n is high.
    Problem in Hashtable: But the problem in hash tables is if the hash function is not good (more collisions happens), then the searching isn't O(1). It tends to O(n) because searching in a bucket is a linear search. Can be worst than a binary tree. problem in binary tree: In binary tree, if the tree isn't balanced, it also tends to O(n). For example if you inserted 1,2,3,4,5 to a binary tree that would be more likely a list. So, If you can see a good hashing methodology, use a hashtable If not, you better using a binary tree.

    0 讨论(0)
提交回复
热议问题