When given a static set of objects (static in the sense that once loaded it seldom if ever changes) into which repeated concurrent lookups are needed with optimal performanc
If your set of objects is truly static and unchanging, you can use a perfect hash to get O(1) performance guaranteed. I've seen gperf mentioned a few times, though I've never had occasion to use it myself.
I wonder why no one mentioned perfect hashing.
It's only relevant if your dataset is fixed for a long time, but what it does it analyze the data and construct a perfect hash function that ensures no collisions.
Pretty neat, if your data set is constant and the time to calculate the function is small compared to the application run time.
Here it's described how hashes are built and because the Universe of keys is reasonably big and hash functions are built to be "very injective" so that collisions rarely happen the access time for a hash table is not O(1) actually ... it's something based on some probabilities. But,it is reasonable to say that the access time of a hash is almost always less than the time O(log_2(n))
Ok, I'll try to be short.
C# short answer:
Test the two different approaches.
.NET gives you the tools to change your approach with a line of code. Otherwise use System.Collections.Generic.Dictionary and be sure to initialize it with a large number as initial capacity or you'll pass the rest of your life inserting items due to the job GC has to do to collect old bucket arrays.
Longer answer:
An hashtable has ALMOST constant lookup times and getting to an item in an hash table in the real world does not just require to compute an hash.
To get to an item, your hashtable will do something like this:
Lookup times depend on how "good" (how sparse is the output) and fast is your hash function, the number of buckets you are using and how fast is the keys comparer, it's not always the best solution.
A better and deeper explanation: http://en.wikipedia.org/wiki/Hash_table
I strongly suspect that in a problem set of size ~1M, hashing would be faster.
Just for the numbers:
a binary search would require ~ 20 compares (2^20 == 1M)
a hash lookup would require 1 hash calculation on the search key, and possibly a handful of compares afterwards to resolve possible collisions
Edit: the numbers:
for (int i = 0; i < 1000 * 1000; i++) {
c.GetHashCode();
}
for (int i = 0; i < 1000 * 1000; i++) {
for (int j = 0; j < 20; j++)
c.CompareTo(d);
}
times: c = "abcde", d = "rwerij" hashcode: 0.0012 seconds. Compare: 2.4 seconds.
disclaimer: Actually benchmarking a hash lookup versus a binary lookup might be better than this not-entirely-relevant test. I'm not even sure if GetHashCode gets memoized under-the-hood
The answer depends. Lets think that the number of elements 'n' is very large. If you are good at writing a better hash function which lesser collisions, then hashing is the best.
Note that
The hash function is being executed only once at searching and it directs to the corresponding bucket. So it is not a big overhead if n is high.
Problem in Hashtable:
But the problem in hash tables is if the hash function is not good (more collisions happens), then the searching isn't O(1). It tends to O(n) because searching in a bucket is a linear search. Can be worst than a binary tree.
problem in binary tree:
In binary tree, if the tree isn't balanced, it also tends to O(n). For example if you inserted 1,2,3,4,5 to a binary tree that would be more likely a list.
So,
If you can see a good hashing methodology, use a hashtable
If not, you better using a binary tree.