问题
I am planning to implement runtime detection of SIMD extensions. Is it such that if I find out that the processor has AVX2 support, it is also guaranteed to have SSE4.2 and AVX support?
回答1:
Support for a more-recent Intel SIMD ISA extension implies support for previous SIMD ones.
AVX2 definitely implies AVX1.
I think AVX1 implies all of SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2 feature bits must also be set in CPUID. If not formally guaranteed, many things make this assumption and a CPU that violated it would probably not be commercially viable for general use.
Note that popcnt
has its own feature bit, so in theory you could have a CPU with AVX2 and SSE4.2, but not popcnt
, but many things treat SSE4.2 as implying popcnt
. So it's more like you can advertize support for popcnt without SSE4.2.
In theory you could make a CPU (or virtual machine) with AVX but which didn't accept the non-VEX legacy-SSE encoding of SSE4.2 instructions like pcmpistri
, but I think you'd be violating Intel's guarantees about what the AVX feature bit implies. Not sure if that's formally written down in a manual, but most software will assume that.
But AVX1 does imply support for the VEX encoding of all SSE4.2 and earlier SIMD instructions, e.g. vpcmpistri or vminss
gcc -mavx2
definitely implies AVX1 and previous extensions, but will only emit code that uses the VEX encoding. It will define the __SSE4_2__
macro and so on, though, so gcc does treat AVX2 as implying earlier SSE extensions and popcnt, but not FMA, AES-NI or PCLMUL. Those are separate features even for GCC.
(In practice you should use gcc -march=native
or gcc -march=znver1
or whatever to enable all the features your CPU has, and set tuning options for it. Not just -mavx2 -mfma
, that leaves tuning settings at bad defaults like splitting every possibly-unaligned 256-bit load/store into 128-bit halves.)
(Note that MSVC doesn't have as many SIMD ISA detection macros; it has one for AVX but not for all of the earlier SSE* extensions. MSVC's model is designed around the assumption that programs will do runtime CPU detection instead of being compiled for the local machine. Although MSVC does now have AVX and AVX2 options to use those as baselines.)
Note that AVX512 kind of breaks the traditions. AVX512F implies support for AVX2 and everything before it, but beyond that AVX512DQ doesn't come "before" or "after" AVX512ER, for example. You can (in theory) have either, both, or neither. (In practice, Skylake-X/Cannonlake/etc. has only a bit of overlap with Xeon Phi (Knight's Landing / Knight's Mill), beyond AVX512F. https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
回答2:
If we set compiler option -mavx2
that GCC doesn't give an error when we use AVX or SSE intrinsics. So GCC supposes that existing of AVX2 flag is enough to run AVX and SSE code. Of course it does not garante that someone won't create CPU with AVX2 and without SSE.
回答3:
In principle, a CPU could just support AVX2 without supporting any SSE4 instructions (Which isn't as stupid an idea as it sounds!). In practice though, if it supports AVX2, it also supports SSE4.
来源:https://stackoverflow.com/questions/53443249/do-all-cpus-which-support-avx2-also-support-sse4-2-and-avx