Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?

前端未结

关注

 5  871

I have always assumed that std::lower_bound() runs in logarithmic time if I pass a pair of red-black tree iterators (set::iterator or map::it


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2020-12-06 09:38
              
            
            
                                                                       
There is no technical reason why this could not be implemented.

To demonstrate, I will sketch out a way to implement this.

We add a new Iterator category, SkipableIterator.  It is a subtype of BiDirectionalIterator and a supertype of RandomAccessIterator.

SkipableIterators guarantee that the function between done in a context where std::between is visible works.

template<typeanme SkipableIterator>
SkipableIterator between( SkipableIterator begin, SkipableIterator end )


between returns an iterator between begin and end.  It returns end if and only if ++begin == end (end is right after begin).

Conceptually, between should efficiently find an element "about half way between" begin and end, but we should be careful to allow a randomized skip list or a balanced red black tree to both work.

Random access iterators have a really simple implementation of between -- return (begin + ((end-begin)+1)/2;

Adding a new tag is also easy.  Derivation makes existing code work well so long as they properly used tag dispatching (and did not explicitly specialize), but there is a small concern of breakage here.  We could have "tag versions" where iterator_category_2 is a refinement of iterator_category (or soemthing less hacky), or we could use a completely different mechanism to talk about skipable iterators (an independent iterator trait?).

Once we have this ability, we can write a fast ordered searching algorithms that works on map/set and multi.  It would also work on a skip list container like QList.  It might be even the same implementation as the random access version!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2020-12-06 09:38
              
            
            
                                                                       
(Elaborating on a comment)

I think it's possible to supply a predicate that is not equivalent to the one supplied to std::set and still fulfil the requirement of partially sorted (for special sets). So you can only replace the lower_bound algorithm by a special red-black version if the predicate is equivalent to the std::set ordering.

Example:

#include <utility>
#include <algorithm>
#include <set>
#include <iostream>

struct ipair : std::pair<int,int>
{
    using pair::pair;
};

bool operator<(ipair const& l, ipair const& r)
{  return l.first < r.first;  }

struct comp2nd
{
    bool operator()(ipair const& l, ipair const& r) const
    {  return l.second > r.second; /* note the > */ }
};

std::ostream& operator<<(std::ostream& o, ipair const& e)
{  return o << "[" << e.first << "," << e.second << "]";  }

int main()
{
    std::set<ipair, comp2nd> my_set = {{0,4}, {1,3}, {2,2}, {3,1}, {4,0}};
    for(auto const& e : my_set) std::cout << e << ", ";

    std::cout << "\n\n";

    // my_set is sorted wrt ::operator<(ipair const&, ipair const&)
    //        and       wrt comp2nd
    std::cout << std::is_sorted(my_set.cbegin(), my_set.cend()) << "\n";
    std::cout << std::is_sorted(my_set.cbegin(), my_set.cend(),
                                comp2nd()) << "\n";

    std::cout << "\n\n";

    // implicitly using operator<
    auto res = std::lower_bound(my_set.cbegin(), my_set.cend(), ipair{3, -1});
    std::cout << *res;

    std::cout << "\n\n";

    auto res2 = std::lower_bound(my_set.cbegin(), my_set.cend(), ipair{-1, 3},
                                 comp2nd());
    std::cout << *res2;
}


Output:

[0,4], [1,3], [2,2], [3,1], [4,0], 

1
1

[3,1]

[1,3]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南方客        
                
              
                            
                2020-12-06 09:56
              
            
            
                                                                       
There are multiple reasons:


When using the non-member version a different predicate can be used. In fact, a different predicate has to be used when using, e.g., a std::map<K, V> as the map predicate operates on Ks while the range operates on pairs of K and V.
Even if the predicate is compatible, the function has an interface using a pair of nodes somewhere in the tree rather than the root node which would be needed for an efficient search. Although it is likely that there are parent pointers, requiring so for the tree seems inappropriate.
The iterators provided to the algorithm are not required to be the t.begin() and the t.end() of the tree. They can be somewhere in the tree, making the use of the tree structure potentially inefficient.
The standard library doesn't have a tree abstraction or algorithms operating on trees. Although the associative ordered containers are [probably] implemented using trees the corresponding algorithms are not exposed for general use.


The part I consider questionable is the use of a generic name for an algorithm which has linear complexity with bidirectional iterators and logarithmic complexity with random access iterators (I understand that the number of comparisons has logarithmic complexity in both cases and that the movements are considered to be fast).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-12-06 09:56
              
            
            
                                                                       
Here's a very simple non-technical reason: It's not required by the standard, and any future change will break backwards compatibility with existing compiled code for no reason.

Wind the clock back to the early 2000's, during the transition between GCC and GCC 3, and later, during minor revisions of GCC 3. Many of the projects I worked on were meant to be binary compatible; we could not require the user to recompile our programs or plugins, and neither could we be certain of the version of GCC they were compiled on or the version of the STL they were compiled against.

The solution: don't use the STL. We had in-house strings, vectors, and tries rather than using the STL. The solution to the dependency hell introduced by an ostensibly standard part of the language was so great, that we abandoned it. Not in just one or two projects either.

This problem has largely gone away, thankfully, and libraries such as boost have stepped in to provide include only versions of the STL containers. In GCC 4, I would see no issue with using standard STL containers, and indeed, binary compatibility is much easier, largely due to standardization efforts.

But your change would introduce a new, unspoken, dependency

Suppose tomorrow, a new data structure comes out, which substantially beats red black trees, but does not provide the guarantee that some specialized iterators are available. One such implementation that was very popular just a few years ago was the skip list, which offered the same guarantees at a possibly substantially smaller memory footprint. The skip list didn't seem to pan out, but another data structure very well could. My personal preference is to use tries, which offer substantially better cache performance and more robust algorithmic performance; their iterators would be substantially different from a red black trees, should someone in the libstdc++ decide that these structures offer better all around performance for most usages.

By following the standard strictly, binary backwards compatibility can be maintained even in the face of data structure changes. This is a Good Thing (TM) for a library meant to be used dynamically. For one that would be used statically, such as the Boost Container library, I would not bat an eye if such optimizations were both well implemented and well used.

But for a dynamic library such as libstdc++, binary backwards compatibility is much more important. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-06 10:03
              
            
            
                                                                       
Great question. I honestly think there's no good/convincing/objective reason for this.

Almost all the reasons I see here (e.g. the predicate requirement) are non-issues to me.   They might be inconvenient to solve, but they're perfectly solvable (e.g. just require a typedef to distinguish predicates).

The most convincing reason I see in the topmost answer is:


  Although it is likely that there are parent pointers, requiring so for the tree seems inappropriate.


However, I think it's perfectly reasonable to assume parent pointers are implemented.

Why? Because the time complexity of set::insert(iterator, value) is guaranteed to be amortized constant time if the iterator points to the correct location.  

Consider that:


The tree must stay self-balancing.
Keeping a tree balanced requires looking at the parent node at every modification.    


How can you possibly avoid storing parent pointers here?  

Without parent pointers, in order to ensure the tree is balanced after the insertion, the tree must be traversed starting from the root every single time, which is certainly not amortized constant time.

I obviously can't mathematically prove there exists no data structure that can provide this guarantee, so there's clearly the possibility that I'm wrong and this is possible.

However, in the absence of such data structures, what I'm saying is that this is a reasonable assumption, given by the fact that all the implementations of set and map I've seen are in fact red-black trees.



Side note, but note that we simply couldn't partially-specialize functions (like lower_bound) in C++03.

But that's not really a problem because we could have just specialized a type instead, and forwarded the call to a member function of that type.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复