I am trying to come up with the best data structure for use in a high throughput C++ server. The data structure will be used to store anything from a few to several million obje
The only way I think this is achievable is through something similar to multiversion concurrency protocol used in databases such as oracle/postgresql etc. This guarantees that readers doesn't block readers, writers doesn't block readers, but writers block only those writers which update the same piece of data. This property of writers blocking the writer(s) which update the same piece of data is important in concurrent programming world, otherwise data/system inconsistencies are possible. For each and every write operation to the data structure you take a snapshot of the data structure or atleast the portion of the data-structure nodes affected by the write operation to a different location in the memory before doing the write. So when the write is in progress, a reader thread requests to read a portion of data from the writer portion, you always refer to the latest snapshot & iterate over that snapshot, there by providing consistent view of data to all the readers. Snapshot's are costly since they consume more memory, but yes for your given requirement, this technique is the right one to go for. And yes use locks (mutex/semaphore/spinlock) to protect the write operation from other writer threads/processes needing to update the same piece of data.
Apologies for the double-answer...
Since writes are fairly rare, you really should consider using STM instead of locking. STM is a form of optimistic locking, which means that it is heavily biased in performance toward collision-free systems (a.k.a. fewer writes). By contrast, pessimistic locking (lock-write-unlock) is optimized for collision-heavy systems (a.k.a. lots of writes). The only catch with STM is it almost demands that you make use of immutable data structures within the TVar cells, otherwise the whole system breaks down. Personally, I don't think this is a problem since a decent immutable data structure is going to be just as fast as a mutable one (see my other answer), but it's worth considering.
FWIW, this is trivial to solve if you have a garbage collector. In F#, for example, you can just use a mutable reference to a linked list or purely functional map (balanced binary tree) without any locks. This works because the data structures are immutable and writing a reference (to update after a write) is atomic so concurrent readers are guaranteed to see either the old or new data structure but never corruption. If you have multiple writers then you can serialize them.
However, this is much harder to solve in C++...
I think linked list should answer your requirements. Note that you can lock only the nodes that are being changed (i.e. deleted/appended) so readers most of the time will be able to work in full parallelism with the writers. This approach requires a lock per linked list node, however it's not a must. You can have limited amount of locks and then several nodes will be mapped to the same lock. I.e., having array of N locks and nodes numbered 0..M you can use lock (NodeId % N) for locking this node. Those can be read-write locks, and by controlling amount of locks you can control amount of parallelism.
Linked lists are definitely the answer here. Insertion and deletion in O(1), iteration from one node to the next in O(1) and stability across operations. std::list
guarantees all of these, including that all iterators are valid unless the element is removed from the list (this include pointers and references to elements). For locking, you could just wrap the list in a locking class, or you could write your own list class (you wouldn't be able to use std::list
in this case that supports node-based locking - e.g. you can lock down certain areas of the list for use while other threads perform operations on different areas. Which you use is largely dependent on the type of concurrent access you expect - if multiple operations on different parts of the list will be really common, write your own, but remember you will be putting a mutex object in each node, which isn't space-efficient.