I work with two computers. One without AVX support and one with AVX. It would be convenient to have my code find the instruction set supported by my CPU at run-time and ch
The fact that the link order matters makes me think that there might be some kind of initialization code in the obj file. If the initialization code is communal, then only the first one is taken. I can't reproduce it, but you should be able to see it in an assembly listing (compile with /c /Ftestavx.asm)
I realise that this is an old question and that the person who asked it appears to be no longer around, but I hit the same problem yesterday. Here's what I worked out.
When compiled both your sse2.cpp and avx.cpp files produce object files that not only contain your function but also any required template functions.
(e.g. Vec8f::load
) These template functions are also compiled using the requested instruction set.
The means that your sse2.obj and avx.obj object files will both contain definitions of Vec8f::load
each compiled using the respective instruction sets.
However, since the compiler treats Vec8f::load
as externally visible, it puts it a 'COMDAT' section of the object file with a 'selectany' (aka 'pick any') label. This tells the linker that if it sees multiple definitions of this symbol, for example in 2 different object files, then it is allowed to pick any one it likes. (It does this to reduce duplicate code in the final executable which otherwise would be inflated in size by multiple definitions of template and inline functions.)
The problem you are having is directly related to this in that the order of the object files passed to the linker is affecting which one it picks. Specifically here, it appears to be picking the first definition it sees.
If this was avx.obj then the AVX compiled version of Vec8F::load
will always be used. This will crash on a machine that doesn't support that instruction set.
On the other hand if sse2.obj is first then the SSE2 compiled version will always be used. This won't crash but it will only use SSE2 instructions even if AVX is supported.
That this is the case can be seen if you look at the linker 'map' file output (produced using the /map option.) Here are the relevant (edited) excerpts -
//
// link with sse2.obj before avx.obj
//
0001:00000080 _main foo.obj
0001:00000330 func_sse2@@YAMPBM@Z sse2.obj
0001:00000420 ??0Vec256fe@@QAE@XZ sse2.obj
0001:00000440 ??0Vec4f@@QAE@ABT__m128@@@Z sse2.obj
0001:00000470 ??0Vec8f@@QAE@XZ sse2.obj <-- sse2 version used
0001:00000490 ??BVec4f@@QBE?AT__m128@@XZ sse2.obj
0001:000004c0 ?get_high@Vec8f@@QBE?AVVec4f@@XZ sse2.obj
0001:000004f0 ?get_low@Vec8f@@QBE?AVVec4f@@XZ sse2.obj
0001:00000520 ?load@Vec8f@@QAEAAV1@PBM@Z sse2.obj <-- sse2 version used
0001:00000680 ?func_avx@@YAMPBM@Z avx.obj
0001:00000740 ??BVec8f@@QBE?AT__m256@@XZ avx.obj
//
// link with avx.obj before sse2.obj
//
0001:00000080 _main foo.obj
0001:00000270 ?func_avx@@YAMPBM@Z avx.obj
0001:00000330 ??0Vec8f@@QAE@XZ avx.obj <-- avx version used
0001:00000350 ??BVec8f@@QBE?AT__m256@@XZ avx.obj
0001:00000380 ?load@Vec8f@@QAEAAV1@PBM@Z avx.obj <-- avx version used
0001:00000580 ?func_sse2@@YAMPBM@Z sse2.obj
0001:00000670 ??0Vec256fe@@QAE@XZ sse2.obj
0001:00000690 ??0Vec4f@@QAE@ABT__m128@@@Z sse2.obj
0001:000006c0 ??BVec4f@@QBE?AT__m128@@XZ sse2.obj
0001:000006f0 ?get_high@Vec8f@@QBE?AVVec4f@@XZ sse2.obj
0001:00000720 ?get_low@Vec8f@@QBE?AVVec4f@@XZ sse2.obj
As for fixing it, that's another matter. In this case, the following blunt hack should work by forcing the avx version to have its own differently named versions of the template functions. This will increase the resulting executable size as it will contain multiple versions of the same function even if the sse2 and avx versions are identical.
// avx.cpp
namespace AVXWrapper {
\#include "vectorclass.h"
}
using namespace AVXWrapper;
float func_avx(const float* a)
{
...
}
There are some important limitations though - (a) if the included file manages any form of global state it will no longer be truly global as you will have 2 'semi-global' versions, and (b) you won't be able to pass vectorclass variables as parameters between other code and functions defined in avx.cpp.
Put the SSE and AVX functions in different CPP files and be sure to compile SSE version wihout /arch:AVX
.