I have always assumed that std::lower_bound()
runs in logarithmic time if I pass a pair of red-black tree iterators (set::iterator
or map::it
There is no technical reason why this could not be implemented.
To demonstrate, I will sketch out a way to implement this.
We add a new Iterator category, SkipableIterator
. It is a subtype of BiDirectionalIterator
and a supertype of RandomAccessIterator
.
SkipableIterator
s guarantee that the function between
done in a context where std::between
is visible works.
template<typeanme SkipableIterator>
SkipableIterator between( SkipableIterator begin, SkipableIterator end )
between
returns an iterator between begin
and end
. It returns end
if and only if ++begin == end
(end
is right after begin
).
Conceptually, between
should efficiently find an element "about half way between" begin
and end
, but we should be careful to allow a randomized skip list or a balanced red black tree to both work.
Random access iterators have a really simple implementation of between
-- return (begin + ((end-begin)+1)/2;
Adding a new tag is also easy. Derivation makes existing code work well so long as they properly used tag dispatching (and did not explicitly specialize), but there is a small concern of breakage here. We could have "tag versions" where iterator_category_2
is a refinement of iterator_category
(or soemthing less hacky), or we could use a completely different mechanism to talk about skipable iterators (an independent iterator trait?).
Once we have this ability, we can write a fast ordered searching algorithms that works on map
/set
and multi
. It would also work on a skip list container like QList
. It might be even the same implementation as the random access version!
(Elaborating on a comment)
I think it's possible to supply a predicate that is not equivalent to the one supplied to std::set
and still fulfil the requirement of partially sorted (for special sets). So you can only replace the lower_bound
algorithm by a special red-black version if the predicate is equivalent to the std::set
ordering.
Example:
#include <utility>
#include <algorithm>
#include <set>
#include <iostream>
struct ipair : std::pair<int,int>
{
using pair::pair;
};
bool operator<(ipair const& l, ipair const& r)
{ return l.first < r.first; }
struct comp2nd
{
bool operator()(ipair const& l, ipair const& r) const
{ return l.second > r.second; /* note the > */ }
};
std::ostream& operator<<(std::ostream& o, ipair const& e)
{ return o << "[" << e.first << "," << e.second << "]"; }
int main()
{
std::set<ipair, comp2nd> my_set = {{0,4}, {1,3}, {2,2}, {3,1}, {4,0}};
for(auto const& e : my_set) std::cout << e << ", ";
std::cout << "\n\n";
// my_set is sorted wrt ::operator<(ipair const&, ipair const&)
// and wrt comp2nd
std::cout << std::is_sorted(my_set.cbegin(), my_set.cend()) << "\n";
std::cout << std::is_sorted(my_set.cbegin(), my_set.cend(),
comp2nd()) << "\n";
std::cout << "\n\n";
// implicitly using operator<
auto res = std::lower_bound(my_set.cbegin(), my_set.cend(), ipair{3, -1});
std::cout << *res;
std::cout << "\n\n";
auto res2 = std::lower_bound(my_set.cbegin(), my_set.cend(), ipair{-1, 3},
comp2nd());
std::cout << *res2;
}
Output:
[0,4], [1,3], [2,2], [3,1], [4,0], 1 1 [3,1] [1,3]
There are multiple reasons:
std::map<K, V>
as the map predicate operates on K
s while the range operates on pairs of K
and V
.t.begin()
and the t.end()
of the tree. They can be somewhere in the tree, making the use of the tree structure potentially inefficient.The part I consider questionable is the use of a generic name for an algorithm which has linear complexity with bidirectional iterators and logarithmic complexity with random access iterators (I understand that the number of comparisons has logarithmic complexity in both cases and that the movements are considered to be fast).
Here's a very simple non-technical reason: It's not required by the standard, and any future change will break backwards compatibility with existing compiled code for no reason.
Wind the clock back to the early 2000's, during the transition between GCC and GCC 3, and later, during minor revisions of GCC 3. Many of the projects I worked on were meant to be binary compatible; we could not require the user to recompile our programs or plugins, and neither could we be certain of the version of GCC they were compiled on or the version of the STL they were compiled against.
The solution: don't use the STL. We had in-house strings, vectors, and tries rather than using the STL. The solution to the dependency hell introduced by an ostensibly standard part of the language was so great, that we abandoned it. Not in just one or two projects either.
This problem has largely gone away, thankfully, and libraries such as boost have stepped in to provide include only versions of the STL containers. In GCC 4, I would see no issue with using standard STL containers, and indeed, binary compatibility is much easier, largely due to standardization efforts.
But your change would introduce a new, unspoken, dependency
Suppose tomorrow, a new data structure comes out, which substantially beats red black trees, but does not provide the guarantee that some specialized iterators are available. One such implementation that was very popular just a few years ago was the skip list, which offered the same guarantees at a possibly substantially smaller memory footprint. The skip list didn't seem to pan out, but another data structure very well could. My personal preference is to use tries, which offer substantially better cache performance and more robust algorithmic performance; their iterators would be substantially different from a red black trees, should someone in the libstdc++ decide that these structures offer better all around performance for most usages.
By following the standard strictly, binary backwards compatibility can be maintained even in the face of data structure changes. This is a Good Thing (TM) for a library meant to be used dynamically. For one that would be used statically, such as the Boost Container library, I would not bat an eye if such optimizations were both well implemented and well used.
But for a dynamic library such as libstdc++, binary backwards compatibility is much more important.
Great question. I honestly think there's no good/convincing/objective reason for this.
Almost all the reasons I see here (e.g. the predicate requirement) are non-issues to me. They might be inconvenient to solve, but they're perfectly solvable (e.g. just require a typedef to distinguish predicates).
The most convincing reason I see in the topmost answer is:
Although it is likely that there are parent pointers, requiring so for the tree seems inappropriate.
However, I think it's perfectly reasonable to assume parent pointers are implemented.
Why? Because the time complexity of set::insert(iterator, value)
is guaranteed to be amortized constant time if the iterator points to the correct location.
Consider that:
How can you possibly avoid storing parent pointers here?
Without parent pointers, in order to ensure the tree is balanced after the insertion, the tree must be traversed starting from the root every single time, which is certainly not amortized constant time.
I obviously can't mathematically prove there exists no data structure that can provide this guarantee, so there's clearly the possibility that I'm wrong and this is possible.
However, in the absence of such data structures, what I'm saying is that this is a reasonable assumption, given by the fact that all the implementations of set
and map
I've seen are in fact red-black trees.
Side note, but note that we simply couldn't partially-specialize functions (like lower_bound
) in C++03.
But that's not really a problem because we could have just specialized a type instead, and forwarded the call to a member function of that type.