For one of my OS X programs, I have a few optimized cases which use SSE4.1 instructions. On SSE3-only machines, the non-optimized branch is ran:
// SupportsSSE4
There is currently no way to target different ISA extensions at block / function granularity in clang. You can only do it at file granularity (put your SSE4.1 code into a separate file and specify that file to use -msse4.1
). If this is an important feature for you, please file a bug report to request it!
However, I should note that the actually benefit of DPPS
is pretty small in most real scenarios (and using DPPS
even slows down some code sequences!). Unless this particular code sequence is critical, and you have carefully measured the effect of using DPPS, it may not be worth the hassle to special case for SSE4.1 even if that compiler feature is available.