How fast is std::swap for integer types?

前端 未结 4 1675
谎友^
谎友^ 2021-01-07 19:45

STL implements a generic std::swap function to swap 2 values. It can be presented in the following way:

template  void swap (T&am         


        
相关标签:
4条回答
  • 2021-01-07 20:14

    In the vast majority of situations, XOR swap is not an optimisation.

    See this wiki entry.

    In most practical scenarios, the trivial swap algorithm using a temporary register is more efficient. Limited situations in which XOR swapping may be practical include:

    • On a processor where the instruction set encoding permits the XOR swap to be encoded in a smaller number of bytes;
    • In a region with high register pressure, it may allow the register allocator to avoid spilling a register.
    • In microcontrollers where available RAM is very limited.

    Because these situations are rare, most optimizing compilers do not generate XOR swap code.

    Also note that your implementation of XOR swap is broken. You need to first check that x and y aren't aliased. This check will definitely make XOR swap slower.

    I'm not aware of any standard library implementation that uses XOR swap.

    Note that, regardless of what the standard library implements, if XOR swap were really faster than normal swap then optimizing compilers would do a peephole optimization to turn it into an XOR swap. This really is a case of just letting the compiler choose for you.

    0 讨论(0)
  • 2021-01-07 20:17

    XOR swap is really only a gimmick and can fail in certain cases (e.g. both variables are references to the same object).

    XOR swap is also not particularly efficient as it has serial dependencies so it will always take at least three instruction cycles. Using a straightforward swap with a temporary has fewer dependencies, allowing for some parallelism on modern superscalar CPUs - on some CPUs it can even be implemented in one instruction, but even without special instructions it may well execute in two cycles.

    0 讨论(0)
  • 2021-01-07 20:19

    As has already been explained in most scenarios the XOR bitfiddling will be slower.

    But it also depends a lot on the surrounding code. Lets say that this swap is being done alone, far away from any other code that requires those values (so they are not loaded into registers) and we are working with "normal" x86 processors here.

    Any algorithm that swaps the 2 values will at least need 2 operations to load the values from memory into registers and another 2 operations to store those values to memory again (x86 does not have operations to swap the content of 2 memory-locations directly).

    When using a temp-variable like so:

    void swap (int& a, int& b)
    {
      int temp = a;
      a = b;
      b = temp;
    }
    

    basically any compiler will recognize that 'temp' is only used locally for the swapping and will not give it a memory-location. And as it only holds the value of 'a' it will not even be a seperate register.

    The assembly-code of that will look something like this (pseudo-assembly):

    load a to rA
    load b to rB
    store rA to b
    store rB to a
    

    So in most scenarios this would be the most efficient possible in terms of memory-access, number of instructions and number of register.

    Only if the compiler fails to recognize that 'temp' is not used for anything else and would store it in a seperate register (or be damned actuall memory) could the XOR-variant be more efficient in anything.

    But this is still pruely theoretical cause your swap will be surrounded by other code and that will be far more important there. If the values are not used anyore then the whole swap will be ignored. If the values are used directly after for other computations then it might just be that the following code has 2 registers swapped so the swap it self has 0 instructions. And you will be really hard pressed to find any solution that is more efficient then literally having nothing to do.

    And of course there are other more obscure instructionsets that might have instructions to directly swap the content of 2 memory locations.

    0 讨论(0)
  • 2021-01-07 20:32

    On X86, a triple XOR swap between memory locations (not CPU registers) takes the same processor cycles as a triple copy. They can be even less if the temporary is a register.

    0 讨论(0)
提交回复
热议问题