Hash Array Mapped Trie (HAMT)

前端未结

关注

 4  1001

I am trying to get my head around the details of a HAMT. I\'d have implemented one myself in Java just to understand. I am familiar with Tries and I think I get the main con

相关标签:

4条回答

轮回少年

2020-12-23 14:58

There's two sections of the paper I think you might of missed. The first is the bit immediately preceding the bit you quoted:

Or the key will collide with an existing one. In which case the existing key must be replaced with a sub-hash table and the next 5 bit hash of the existing key computed. If there is still a collision then this process is repeated until no collision occurs.

So if you have object A in the table and you add object B which clashes, the cell at which their keys clashed will be a pointer to another subtable (where they don't clash).

Next, Section 3.7 of the paper you linked describes the method for generating a new hash when you run off the end of your first 32 bits:

The hash function was tailored to give a 32 bit hash. The algorithm requires that the hash can be extended to an arbitrary number of bits. This was accomplished by rehashing the key combined with an integer representing the trie level, zero being the root. Hence if two keys do give the same initial hash then the rehash has a probability of 1 in 2^32 of a further collision.

If this doesn't seem to explain anything, say and I'll extend this answer with more detail.

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-23 15:05

HAMT is great and highly performant structure especially when one needs immutable objects, i.e. each time after any modification a new copy of a data structure is created!

As for your question on hash collisions, I have found a C# implementation (which is buggy now) that shows how it works: on each hash collision the key is rehashed and lookup is retried recursively until maximum iterations limit is reached.

Currently I am also exploring HAMP in functional programming context and learning existing code. There are several reference implementations of HAMT in Haskell as Data.HshMap and in Clojure as PersistenceHashMap.

There are some other simpler implementations on the web that do not deal with collisions, but they are useful to understand the concept. Here they are in Haskell and OCaml

I have found a nice summary article article that describes HAMT with pictures and links to original research papers by Phil Bagwell.

Related points:

While implementing HAMT in F# I have noticed that popCount function implementation described here really matters and gives 10-15% compared to naive implementation described in the next answers in the link. Not great, but a free lunch.

Related IntMap structures (Haskell and its port to F#) are very good when the key could be an integer and they implement related PATRICIA/Radix trie.

I believe all these implementation are very good to learn efficient immutable data structure and functional languages in all their beauty on these examples - they really shine together!

0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-23 15:09
If I were to compute a "new" hash and store the object at that new hash; how would you ever be able to look-up the object in the structure? When doing a look-up, wouldn't it generate the "initial" hash and not the "re-computed hash".

When doing a look-up the initial hash is used. When the bits in the initial hash is exhausted, either one of the following condition is true:
1. we end up with a key/value node - return it
2. we end up with an index node - this is the hint that we have to go deeper by recomputing a new hash.
The key here is hash bits exhaustion.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-23 15:21

The chance of collision is presumably very low, and generally only problematic for huge trees. Given this, you're better off just storing collisions in an array at the leaf and searching it linearly (I do this in my C# HAMT).

0 讨论(0)
发布评论:

提交评论
- 加载中...