Hash table - implementing with Binary Search Tree

问题

From Cracking the Coding Interview, page 71:

Alternatively, we can implement hash table with a BST. We can then guarantee an O(log n) lookup time, since we can keep the tree balanced. Additionally we may use less space, since a large array no longer needs to be allocated in the very beginning.

I know the basics of linked lists, hash tables and BSTs, but I am unable to understand these lines. What does it actually mean? Would this final data structure would be a Trie?

回答1:

The full text of that section states, with the last paragraph being the one you asked about:

A hash table is a data structure that maps keys to values for highly efficient lookup. In a very simple implementation of a hash table, the hash table has an underlying array and a hash function. When you want to insert an object and its key, the hash function maps the key to an integer, which indicates the index in the array. The object is then stored at that index.

Typically, though, this won't quite work right. In the above implementation, the hash value of all possible keys must be unique, or we might accidentally overwrite data. The array would have to be extremely large—the size of all possible keys—to prevent such "collisions."

Instead of making an extremely large array and storing objects at index hash (key), we can make the array much smaller and store objects in a linked list at index hash (key) % array_length.To get the object with a particular key, we must search the linked list for this key.

Alternatively, we can implement the hash table with a binary search tree. We can then guarantee an 0(log n) lookup time, since we can keep the tree balanced. Additionally, we may use less space, since a large array no longer needs to be allocated in the very beginning.

So they're talking about using a BST (binary search tree) to handle collisions. It wouldn't actually make sense to use a BST as the sole data structure since the whole point of a properly tuned hash is that look-up is on the order of O(1), much better than the O(log n) from a BST. On top of that, using a BST to totally implement a hash table means it's not actually a hash table :-)

However, consider that, when you have collisions in a hash table, a frequent way to handle them is to have each bucket contain a linked list of its items. In the degenerate case (all items hashing to the same bucket), you end up with just a linked list and the O(1) turns into O(n).

So, rather than a linked list at each bucket, you have a BST. Then you no longer have O(n) search complexity in cases where a single bucket has many items (the previously mentioned collisions).

You use the hash function to find the bucket in O(1) then search through the BST in O(log n) if there are collisions. In the best case (one item per bucket), it's still O(1). The worst case then becomes O(log n) rather than O(n).

The only thing that originally concerned me about that theory is that they also discuss the fact that a large allocation is no longer necessary. If it's a shared hash/BST combination, you still need to allocate the entire hash table so that seemed incongruous.

However, from the context ("... since a large array no longer needs to be allocated ..."), it appears that they mean they can make the hash table part of the dual data structure smaller as the collisions are more efficient to process. In other words, rather than a 1000-element hash table with linked lists for collisions, you can get away with a 100-element hash table because the collisions are not so damaging to the search time if you use a BST.

回答2:

You're conflating a few terms here.

The idea would be to implement the hash table with both the array and a BST in a two-tiered fashion. One would still add values into the hash if there were no collision, but if there was, then one could solve the performance of retrieving a collided element with the BST.
A trie is something entirely different; depending on what you were attempting to store, you might not be able to apply it to a hashing function.

回答3:

a O(logN) bound would be the worst case scenario in case of a tree.Lets look at it this way. We insert 45, 33, 55,66,22 and we would then have 45 as the root node,33 and 55 in level1,22 and 66 in level 2..

So, if you were to hash for value 45, it would still be a O(1) operation...Only when you look for nodes in level2 would it amount close to O(logN)....The tree could be a RB tree/AVL tree so that it does not degenerate into a linked list....You lose some tiem efficiency but make up for it in space efficiency..

One more advantage would be you don't need to bother about collisions then in hash table. http://www.cs.rit.edu/~ib/Classes/CS233_Spring08-09/Slides/Week9_Hashing.pdf

Basically, you would have a dynamic allocation of nodes and no space is wasted on unused buckets in hash table...Say, you were to use a static hash table witha pre-determined size(busckets), then, it would lead to a space inefficient implementation .

来源：https://stackoverflow.com/questions/27739241/hash-table-implementing-with-binary-search-tree

标签

data-structures

hashtable

binary-search-tree