Performance difference between two seemingly equivalent assembly codes
问题 tl;dr : I have two functionally equivalent C codes that I compile with Clang (the fact that it's C code doesn't matter much; only the assembly is interesting I think), and IACA tells me that one should be faster, but I don't understand why, and my benchmarks show the same performance for the two codes. I have the following C code (ignore #include "iacaMarks.h" , IACA_START , IACA_END for now): ref.c: #include "iacaMarks.h" #include <x86intrin.h> #define AND(a,b) _mm_and_si128(a,b) #define OR