问题
What is the time complexity of applying the next()
and prev()
function on an multiset<int>::iterator
type object where the corresponding multiset contains the N
elements?
I understand that in STL, a multiset is implemented as a balanced binary search tree and hence I expect the time complexity to be O(log N) per operation (in the worst case) in case we just traverse the tree until we find the appropriate value, but I have a hunch that this should be O(1) on average.
But what if the tree is implemented as follows - when inserting element x
in the balanced binary search tree, we can also retrieve the the largest number in the tree smaller than x and the smallest number in the tree larger than x
in O(log N). Thus in theory, we can have each node in the tree maintain pointer to its next
and prev
elements so that next()
and prev()
then run in constant time per query.
Can anybody share some light on what's up?
回答1:
The standard mandates that all operations on iterators run in amortized constant time: http://www.eel.is/c++draft/iterator.requirements#general-10. The basic idea is that each iterator category only defines operations which can be implemented in amortized time.
Iteration is a common thing to do, and if operator++
on an iterator (I guess that's what you mean by next?) was logN, then traversing a container in a loop would be NlogN. The standard makes this impossible; since operator++
is amortized constant, iterating over any data structure in the standard is always O(N).
However, I dug into the implementation of multiset
on gcc5.4 to at least have one example. Both set
and multiset
are implemented in terms of the same underlying structure, _Rb_tree
. Delving into that structure a bit, it's nodes not only have left and right node pointers, but also a parent node pointer, and an iterator is just a pointer to a node.
Given a node in a binary search tree that includes a pointer to its parent, it's easy to figure out what the next node in the tree is:
- If it has a right child, descend to the right child. Then descend left child as far as you can; that's the next node.
- If it does not have a right child, ascend to your parent, and determine whether the original node is the left or right child of the parent. If the node is the left child of the parent, then the parent is the next node. If the node is the right of the parent, the parent was already processed, so you need to apply the same logic recursively between the parent and its grandparent.
This SO question shows the source code with the core logic: What is the definition of _Rb_tree_increment in bits/stl_tree.h? (it's surprisingly hard to find for some reason).
This does not have constant time, in particular in both 1. and 2. we have loops that either descend or ascend and could take at most log(N) time. However, you can easily convince yourself that the amortized time is constant because as you traverse the tree with this algorithm, each node is touched at most 4 times:
- Once on the way down to its left child.
- Once when it comes back up from the left child and needs to consider itself.
- Once when it descends to its right child.
- Once when ascending from the right child.
In retrospect I would say this is the fairly-obvious choice. Iteration over the whole data structure is a common operation, so the performance is very important. Adding a third pointer to the node is not a trivial amount of space, but it's not the end of the world either; at most it will bloat the data structure from 3 to 4 words (2 pointers + data, which alignment makes 3 at the minimum, vs 3 pointers + data). If you work with ranges, as opposed to two iterators, an alternative would be to maintain a stack and then you don't need the parent pointer, but this only works if you iterate from the very beginning to the end; it wouldn't allow iteration from an iterator in the middle to the end (which is also an important operation for BST's).
回答2:
I think the next() and prev() will take anywhere between 1 and h where h is the height of the tree which is approx O(log N). If you use next() to walk from beginning to end, N nodes, the iterator should visit the entire tree and that is about 2N (2 because the iterator has to traverse the downwards then upwards through the pointers that link the nodes). Total traversal is not O(N * log N) as some steps are better than other steps. At the very worst, a next() might be from a leaf node to the head node which is h approximately O(log N). But that will only occur twice (once to arrive at begin(), the second time at the right most node of the left tree to the head node). So on average next() and prev() are 2 which is O(1).
来源:https://stackoverflow.com/questions/46119328/c-running-time-of-next-and-prev-in-a-multiset-iterator