I am trying to come up with the best data structure for use in a high throughput C++ server. The data structure will be used to store anything from a few to several million obje
I'm not sure if anybody has mentioned this, but I would take inspiration from Java's ConcurrentHashMap. It offers traversal, retrieval and insertion without locking or waiting. The only lock occurs once you've found a bucket of data corresponding to the hash key and you're traversing that bucket (i.e. you ONLY lock the bucket not the actual hash map). "Instead of a single collection lock, ConcurrentHashMap uses a fixed pool of locks that form a partition over the collection of buckets."
You can find more details on the actual implementation here. I believe that all of the things shown in the implementation can be just as easily done with C++.
So let's go through your list of requirements:
1. High throughput. CHECK
2. Thread safe. CHECK
3. Efficient inserts happen in O(1). CHECK
4. Efficient removal (with no data races or locks). CHECK
5. VERY efficient traversal. CHECK
6. Does not lock or wait. CHECK
7. Easy on the memory. CHECK
8. It is scalable (just increase the lock pool). CHECK
Here is an example of a map Entry:
protected static class Entry implements Map.Entry {
protected final Object key;
protected volatile Object value;
protected final int hash;
protected final Entry next;
...
}
Note that the value is volatile, so when we're removing an Entry we set the value to NULL which is automatically visible to and any other thread that attempts to read the value.