问题
Using the c program:
int main(int argc , char** argv)
{
return __builtin_popcountll(0xf0f0f0f0f0f0f0f0);
}
and the compiler line (gcc 4.4 - Intel Xeon L3426):
gcc -msse4.2 poptest.c -o poptest
I do NOT get the builtin popcnt insruction rather the compiler generates a lookup table and computes the popcount that way. The resulting binary is over 8000 bytes. (Yuk!)
Thanks so much for any assistance.
回答1:
You have to tell GCC to generate code for an architecture that supports the popcnt instruction:
gcc -march=corei7 popcnt.c
Or just enable support for popcnt:
gcc -mpopcnt popcnt.c
In your example program the parameter to __builtin_popcountll
is a
constant so the compiler will probably do the calculation at compile
time and never emit the popcnt instruction. GCC does this even if not
asked to optimize the program.
So try passing it something that it can't know at compile time:
int main (int argc, char** argv)
{
return __builtin_popcountll ((long long) argv);
}
$ gcc -march=corei7 -O popcnt.c && objdump -d a.out | grep '<main>' -A 2
0000000000400454 <main>:
400454: f3 48 0f b8 c6 popcnt %rsi,%rax
400459: c3 retq
回答2:
You need to do it like this:
#include <stdio.h>
#include <smmintrin.h>
int main(void)
{
int pop = _mm_popcnt_u64(0xf0f0f0f0f0f0f0f0ULL);
printf("pop = %d\n", pop);
return 0;
}
$ gcc -Wall -m64 -msse4.2 popcnt.c -o popcnt
$ ./popcnt
pop = 32
$
EDIT
Oops - I just checked the disassembly output with gcc 4.2 and ICC 11.1 - while ICC 11.1 correctly generates popcntl
or popcntq
, for some reason gcc does not - it calls ___popcountdi2
instead. Weird. I will try a newer version of gcc when I get a chance and see if it's fixed. I guess the only workaround otherwise is to use ICC instead of gcc.
回答3:
For __builtin_popcountll
in GCC, all you need to do is add -mpopcnt
#include <stdlib.h>
int main(int argc, char **argv) {
return __builtin_popcountll(atoi(argv[1]));
}
with -mpopcnt
$ otool -tvV a.out
a.out:
(__TEXT,__text) section
_main:
0000000100000f66 pushq %rbp
0000000100000f67 movq %rsp, %rbp
0000000100000f6a subq $0x10, %rsp
0000000100000f6e movq %rdi, -0x8(%rbp)
0000000100000f72 movq -0x8(%rbp), %rax
0000000100000f76 addq $0x8, %rax
0000000100000f7a movq (%rax), %rax
0000000100000f7d movq %rax, %rdi
0000000100000f80 callq 0x100000f8e ## symbol stub for: _atoi
0000000100000f85 cltq
0000000100000f87 popcntq %rax, %rax
0000000100000f8c leave
0000000100000f8d retq
without -mpopcnt
a.out:
(__TEXT,__text) section
_main:
0000000100000f55 pushq %rbp
0000000100000f56 movq %rsp, %rbp
0000000100000f59 subq $0x10, %rsp
0000000100000f5d movq %rdi, -0x8(%rbp)
0000000100000f61 movq -0x8(%rbp), %rax
0000000100000f65 addq $0x8, %rax
0000000100000f69 movq (%rax), %rax
0000000100000f6c movq %rax, %rdi
0000000100000f6f callq 0x100000f86 ## symbol stub for: _atoi
0000000100000f74 cltq
0000000100000f76 movq %rax, %rdi
0000000100000f79 callq 0x100000f80 ## symbol stub for: ___popcountdi2
0000000100000f7e leave
0000000100000f7f retq
Notes
Be sure to check the ABM bit (bit 23) of CPUID feature bits before using POPCNTQ
来源:https://stackoverflow.com/questions/6427473/how-to-generate-a-sse4-2-popcnt-machine-instruction