Hamming weight ( number of 1 in a number) mixing C with assembly

后端 未结 4 1375
花落未央
花落未央 2020-12-21 11:39

I\'m trying to count how many number 1, are in the numbers of an array.

First I have a code in C lenguaje(work ok):

int popcount2(int* array, int le         


        
4条回答
  •  时光说笑
    2020-12-21 11:47

    The nicest think you can do is using built in popcount function as suggested by Paul R, but since you need to write it in assembly, this worked for me:

    asm (
    "start:                  \n"
            "and %0, %1      \n"
            "jz end          \n"
            "shr $0, %1      \n"
            "jnc start       \n"
            "inc %1          \n"
            "jmp start       \n"
    "end:                    \n"
            : "+g" (result),
              "+r" (x)
            :
            : "cc"
    );
    

    At first two lines you just check the contents of x (and go to end if it's zero Jump Zero). Than you shift x one bit to right and:

    At the end of the shift operation, the CF flag contains the last bit shifted out of the destinationoperand. *

    If there's no CF set, just go to start (Jump Not Carry) else increment result and then go to start.

    And the beautiful think about assembly is that you can do things in so many ways...

    asm (
    "start:                  \n"
            "shr $1, %1      \n"
            "jnc loop_cond   \n"
            "inc %0          \n"
            "and %1, %1      \n"
    "loop_cond:              \n"
            "jnz start       \n"
    
            : "+g" (result),
              "+r" (x)
            :
            : "cc"
    );
    

    Here you again use SHift Right instruction, if no CF is present just go to loop condition.

    Otherwise again increment result and call binary AND (INC does modify ZF).


    Using LOOP and ECX

    I was curious how to do this in 3 instruction (I figured your teacher wouldn't give you bottom limit of 3 if it wasn't possible) and I realized x86 also has LOOP instruction:

    Each time the LOOP instruction is executed, the count register is decremented, then checked for 0. If the count is 0, the loop is terminated and program execution continues with the instruction following the LOOP instruction. If the count is not zero, a near jump is performed to the destination (target) operand, which is presumably the instruction at the beginning of the loop. *

    And you can add input argument using GCCs input constrain:

    c - The c register.

    asm (
    "start:              \n"
        "shr $1, %1      \n"
        "adc $0, %0      \n"
        "loop start      \n"
    
        : "+g" (result)
        : "r" (x),
          "c" (8)             // Assuming 8b type (char)
    );
    

    Just to make sure it compiles to proper assembly:

    0x000000000040051f <+25>:   mov    $0x8,%ecx
    0x0000000000400524 <+30>:   mov    -0x8(%rbp),%eax
    0x0000000000400527 <+33>:   shr    %edx
    0x0000000000400529 <+35>:   adc    $0x0,%eax
    0x000000000040052c <+38>:   loop   0x400527 
    

    I think the first one should have a bit better performance, especially if there's just 1 bit set, this approach always does k*8 iterations.


    SSE4 and single instruction

    I know you have to use a loop, but just for fun... With SSE4 extension you could do this by just one instruction POPCNT:

    This instruction calculates of number of bits set to 1 in the second operand (source) and returns the count in the first operand (a destination register). *

    I think (I have a quite old CPU on my notebook, so I can't test this for you) you should be able to do this with just one simple instruction:

    asm (   
        "POPCNT %1, %0   \n"
        : "=r" (result)
        : "mr" (x)
        : "cc"                                                                                                                                       
    );
    

    (If you try this and you do have SSE4 extension, please let me know if it works)


    Performance

    I've measured times required to do 100,000,000 popcounts comparing my first and second methods with David Wohlferd's. [Raw data]

    +--------------+------------+------------+------------+
    |              | 0x00000000 | 0x80000001 | 0xffffffff |
    +--------------+------------+------------+------------+
    | 1st solution |  0.543     |  5.040     |  3.833     |
    | LOOP         | 11.530     | 11.523     | 11.523     |
    | Davids       |  0.750     |  4.893     |  4.890     |
    +--------------+------------+------------+------------+
    

    If anyone can compare these 3 with SSE4's POPCNT instruction I'd be glad.

提交回复
热议问题