Why is std::rotate so fast?

前端 未结 2 590
野的像风
野的像风 2021-02-01 03:57

Why is std::rotate so much faster than the equivalent function that cplusplus.com describes?

cplusplus.com\'s implementation:

template 

        
相关标签:
2条回答
  • 2021-02-01 04:00

    As the commentors already stated, it depends on your Standard Library implementation. But the code that you posted is valid even for forward iterators. As such, it imposes very little requirements (only that these iterators can be incremented and dereferenced).

    Stepanov's classic Elements of Programming devotes an entire chapter (10) to rotate and other rearrangement algorithms. For forward iterators, the series of swaps in your code gives O(3N) assignments. For bidirectional iterators, three consecutive calls to reverse yield another O(3N) algorithm. For random access iterators, std::rotate can be implemented as O(N) assignments by defining a permutation of indices w.r.t. to the starting iterator first.

    All the above algorithms are in-place. Using a memory buffer, it is possible that the random access version can benefit from the greater cache locality of memcpy() or memmove() (if the underlying value type is POD) in which entire blocks of contiguous memory can be swapped. If your insertion sort is done on an array or std::vector, it is likely that your Standard Library will take advantage of this optimization.

    TL;DR: trust your Standard Library and don't reinvent the wheel!

    0 讨论(0)
  • 2021-02-01 04:02

    Edit:

    Since the context is not given, it is not clear if your code calls std::swap() or another swap(a,b) algorithm like

    T tmp = a; a = b; b = tmp;
    

    When a and b are vectors of 1000 ints each, this would copy all vector elements 3 times. The specialized version of std::swap() for containers like std::vector<T> call the container a.swap(b) method instead, essentially swapping only the dynamic data pointers of the containers.

    Also, for different iterator types the std::rotate() implementation can utilize some optimizations (see my older, possibly misleading answer below).


    Caveat: The implementation of std::rotate() is implementation-dependent. For different iterator categories, different algorithms can be utilized (e.g. look for __rotate( in bits/stl_algo.h header of GNU g++).

    To shift n elements by m=std::distance(first,middle) a simple (naive) algorithm like m rotations by one element needs O(n*m) moving or copying operations. But only O(n) moves are needed, when each element is directly placed to its right position, which results in a (roughly) m times faster algorithm.

    An example for illustration: Rotate a string s = "abcdefg" by three elements:

    abcdefg : store 'a' in temporary place
    dbcdefg : move s[3] to s[0] (where it belongs in the end, directly)
    dbcgefg : move s[6] to s[3]
    dbcgefc : move s[9%7] to s[6] (wrapping index modulo container size: 9%7 == 2)
    dbfgefc : move s[5] to s[2]
    dbfgebc : move s[1] to s[5] (another wrapping around)
    defgebc : move s[4] to s[1]
    defgabc : move 'a' from temporary place to s[4]
    

    For n and m with greatest common divisor 1 you are done now. Otherwise you have to repeat that scheme n/m time for the first m consecutive elements (n > m assumed here). This little more complicated algorithm is much faster.

    For bidirectional iterators another legendary O(3n) algorithm can be used, referred to as "flipping hands". According to Jon Bentley's book Programming Pearls it was used in early UNIX editors for moving text:

    Place your hands in front of you, one above the other, thumbs up. Now

    1. Turn one hand.
    2. Turn the other.
    3. Turn both, connected to each other.

    In code:

    reverse(first, middle);
    reverse(middle, last);
    reverse(first, last);
    

    For random access iterators large chunks of memory can be relocated by swap_ranges() (or memmove() operations for POD types).

    Microoptimization by utilising assembler operations can give a small extra amount of acceleration, it can be done on top of the fasted algorithm.

    Algorithms using consecutive elements instead of "hopping around" in memory also result in smaller number of cache misses on modern computer architectures.

    0 讨论(0)
提交回复
热议问题