Trying to understand clang/gcc __builtin_memset on constant size / aligned pointers
问题 Basically I am trying to understand why both gcc/clang use xmm register for their __builtin_memset even when the memory destination and size are both divisible by sizeof ymm (or zmm for that matter) and the CPU supports AVX2 / AVX512 . and why GCC implements __builtin_memset on medium sized values without any SIMD (again assuming CPU supports SIMD). For example: __builtin_memset(__builtin_assume_aligned(ptr, 64), -1, 64)); Will compile to: vpcmpeqd %xmm0, %xmm0, %xmm0 vmovdqa %xmm0, (%rdi)