Proper way to enable SSE4 on a per-function / per-block of code basis?

不打扰是莪最后的温柔 提交于 2020-01-01 03:16:09

问题


For one of my OS X programs, I have a few optimized cases which use SSE4.1 instructions. On SSE3-only machines, the non-optimized branch is ran:

// SupportsSSE4_1 returns true on CPUs that support SSE4.1, false otherwise
if (SupportsSSE4_1()) {

    // Code that uses _mm_dp_ps, an SSE4 instruction

    ...

    __m128 hDelta   = _mm_sub_ps(here128, right128);
    __m128 vDelta   = _mm_sub_ps(here128, down128);

    hDelta = _mm_sqrt_ss(_mm_dp_ps(hDelta, hDelta, 0x71));
    vDelta = _mm_sqrt_ss(_mm_dp_ps(vDelta, vDelta, 0x71));

    ...

} else {
    // Equivalent code that uses SSE3 instructions
    ...
}

In order to get the above to compile, I had to set CLANG_X86_VECTOR_INSTRUCTIONS to sse4.1.

However, this seems to instruct clang that it's ok to use the ROUNDSD instruction anywhere in my program. Hence, the program is crashing on SSE3-only machines with SIGILL: ILL_ILLOPC.

What's the best practice for enabling SSE4.1 for just the lines the code inside of true branch of the SupportsSSE4_1() if block?


回答1:


There is currently no way to target different ISA extensions at block / function granularity in clang. You can only do it at file granularity (put your SSE4.1 code into a separate file and specify that file to use -msse4.1). If this is an important feature for you, please file a bug report to request it!

However, I should note that the actually benefit of DPPS is pretty small in most real scenarios (and using DPPS even slows down some code sequences!). Unless this particular code sequence is critical, and you have carefully measured the effect of using DPPS, it may not be worth the hassle to special case for SSE4.1 even if that compiler feature is available.




回答2:


You can make a CPU dispatcher. You can do this in one file but you have to compile twice. First with SSE4.1 and then without and then link in the object file for SSE4.1. The first time you call your fucntion myfunc it calls the function myfunc_dispatch which determines the instruction set and sets the pointer to either myfunc_SSE41 or myfunc_SSE3. The next time you call your func myfunc it jumps right to the function for your instruction set.

//clang -c -O3 -msse4.1 foo.cpp -o foo_sse41.o
//clang -O3 -msse3 foo.cpp foo_sse41.o   

typedef float MyFuncType(float*);

MyFuncType myfunc, myfunc_SSE41, myfunc_SSE3, myfunc_dispatch;
MyFuncType * myfunc_pointer = &myfunc_dispatch;

#ifdef __SSE4_1__
float myfunc_SSE41(float* a) {
    //SSE41 code
}
#else
float  myfunc_SSE3(float *a) {
    //SSE3 code
}

float myfunc_dispatch(float *a) {
    if(SupportsSSE4_1()) {
        myfunc_pointer = myfunc_SSE41;
    }
    else {
        myfunc_pointer = myfunc_SSE3;
    }
    myfunc_pointer(a);
}

float myfunc(float *a) {
    (*myfunc_pointer)(a);
}
int main() {
    //myfunc(a);
}
#endif



回答3:


Depending on the OS you might be able to use something like Function Multiversioning in the future. I'm working on the feature right now, but it'll be a while before it's ready for use in a production compiler.

See http://gcc.gnu.org/wiki/FunctionMultiVersioning for more details.



来源:https://stackoverflow.com/questions/24101875/proper-way-to-enable-sse4-on-a-per-function-per-block-of-code-basis

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!