Branchless code that maps zero, negative, and positive to 0, 1, 2

后端 未结 10 2314
南笙
南笙 2020-12-01 11:21

Write a branchless function that returns 0, 1, or 2 if the difference between two signed integers is zero, negative, or positive.

Here\'s a version with branching:

相关标签:
10条回答
  • 2020-12-01 12:04

    Branchless (at the language level) code that maps negative to -1, zero to 0 and positive to +1 looks as follows

    int c = (n > 0) - (n < 0);
    

    if you need a different mapping you can simply use an explicit map to remap it

    const int MAP[] = { 1, 0, 2 };
    int c = MAP[(n > 0) - (n < 0) + 1];
    

    or, for the requested mapping, use some numerical trick like

    int c = 2 * (n > 0) + (n < 0);
    

    (It is obviously very easy to generate any mapping from this as long as 0 is mapped to 0. And the code is quite readable. If 0 is mapped to something else, it becomes more tricky and less readable.)

    As an additinal note: comparing two integers by subtracting one from another at C language level is a flawed technique, since it is generally prone to overflow. The beauty of the above methods is that they can immedately be used for "subtractionless" comparisons, like

    int c = 2 * (x > y) + (x < y);
    
    0 讨论(0)
  • 2020-12-01 12:04

    Unsigned Comparison that returns -1,0,1 (cmpu) is one of the cases that is tested for by the GNU SuperOptimizer.

    cmpu: compare (unsigned)
    int cmpu(unsigned_word v0, unsigned_word v1)
    {
        return ( (v0 > v1) ? 1 : ( (v0 < v1) ? -1 : 0) );
    }
    

    A SuperOptimizer exhaustively searches the instruction space for the best possible combination of instructions that will implement a given function. It is suggested that compilers automagically replace the functions above by their superoptimized versions (although not all compilers do this). For example, in the PowerPC Compiler Writer's Guide (powerpc-cwg.pdf), the cmpu function is shown as this in Appendix D pg 204:

    cmpu: compare (unsigned)
    PowerPC SuperOptimized Version
    subf  R5,R4,R3
    subfc R6,R3,R4
    subfe R7,R4,R3
    subfe R8,R7,R5
    

    That's pretty good isn't it... just four subtracts (and with carry and/or extended versions). Not to mention it is genuinely branchfree at the machine opcode level. There is probably a PC / Intel X86 equivalent sequence that is similarly short since the GNU Superoptimizer runs for X86 as well as PowerPC.

    Note that Unsigned Comparison (cmpu) can be turned into Signed Comparison (cmps) on a 32-bit compare by adding 0x80000000 to both Signed inputs before passing it to cmpu.

    cmps: compare (signed)
    int cmps(signed_word v0, signed_word v1)
    {
        signed_word offset=0x80000000;
        return ( (unsigned_word) (v0 + signed_word),
            (unsigned_word) (v1 + signed_word) );
    }
    

    This is just one option though... the SuperOptimizer may find a cmps that is shorter and does not have to add offsets and call cmpu.

    To get the version that you requested that returns your values of {1,0,2} rather than {-1,0,1} use the following code which takes advantage of the SuperOptimized cmps function.

    int Compare(int x, int y)
    {
        static const int retvals[]={1,0,2};
        return (retvals[cmps(x,y)+1]);
    }
    
    0 讨论(0)
  • 2020-12-01 12:05
    int Compare(int x, int y) {
         return (x < y) + (y < x) << 1;
    }
    

    Edit: Bitwise only? Guess < and > don't count, then?

    int Compare(int x, int y) {
        int diff = x - y;
        return (!!diff) | (!!(diff & 0x80000000) << 1);
    }
    

    But there's that pesky -.

    Edit: Shift the other way around.

    Meh, just to try again:

    int Compare(int x, int y) {
        int diff = y - x;
        return (!!diff) << ((diff >> 31) & 1);
    }
    

    But I'm guessing there's no standard ASM instruction for !!. Also, the << can be replaced with +, depending on which is faster...

    Bit twiddling is fun!

    Hmm, I just learned about setnz.

    I haven't checked the assembler output (but I did test it a bit this time), and with a bit of luck it could save a whole instruction!:

    IN THEORY. MY ASSEMBLER IS RUSTY

    subl  %edi, %esi
    setnz %eax
    sarl  $31, %esi
    andl  $1, %esi
    sarl  %eax, %esi
    mov   %esi, %eax
    ret
    

    Rambling is fun.

    I need sleep.

    0 讨论(0)
  • 2020-12-01 12:05

    I'm siding with Tordek's original answer:

    int compare(int x, int y) {
        return (x < y) + 2*(y < x);
    }
    

    Compiling with gcc -O3 -march=pentium4 results in branch-free code that uses conditional instructions setg and setl (see this explanation of x86 instructions).

    push   %ebp
    mov    %esp,%ebp
    mov    %eax,%ecx
    xor    %eax,%eax
    cmp    %edx,%ecx
    setg   %al
    add    %eax,%eax
    cmp    %edx,%ecx
    setl   %dl
    movzbl %dl,%edx
    add    %edx,%eax
    pop    %ebp
    ret 
    
    0 讨论(0)
提交回复
热议问题