Are older SIMD-versions available when using newer ones?

前端 未结 3 1153
孤独总比滥情好
孤独总比滥情好 2021-02-13 15:47

When I can use SSE3 or AVX, are then older SSE versions as SSE2 or MMX available -
or do I still need to check for them separately?

3条回答
  •  野趣味
    野趣味 (楼主)
    2021-02-13 16:19

    In general, these have been additive but keep in mind that there are differences between Intel and AMD support for these over the years.

    If you have AVX, then you can assume SSE, SSE2, SSE3, SSSE3, SSE4.1, and SSE 4.2 as well. Remember that to use AVX you also need to validate the OSXSAVE CPUID bit is set to ensure the OS you are using actually supports saving the AVX registers as well.

    You should still explicitly check for all the CPUID support you use in your code for robustness (say checking for AVX, OSXSAVE, SSE4, SSE3, SSSE3 at the same time to guard your AVX codepaths).

    #include 
    
    inline bool IsAVXSupported()
    {
    #if defined(_M_IX86 ) || defined(_M_X64)
       int CPUInfo[4] = {-1};
       __cpuid( CPUInfo, 0 );
    
       if ( CPUInfo[0] < 1  )
           return false;
    
        __cpuid(CPUInfo, 1 );
    
        int ecx = 0x10000000 // AVX
                  | 0x8000000 // OSXSAVE
                  | 0x100000 // SSE 4.2
                  | 0x80000 // SSE 4.1
                  | 0x200 // SSSE3
                  | 0x1; // SSE3
    
        if ( ( CPUInfo[2] & ecx ) != ecx )
            return false;
    
        return true;
    #else
        return false;
    #endif
    }
    

    SSE and SSE2 are required for all processors capable of x64 native, so they are good baseline assumptions for all code. Windows 8.0, Windows 8.1, and Windows 10 explicitly require SSE and SSE2 support even for x86 architectures so those instruction sets are pretty ubiquitous. In other words, if you fail a check for SSE or SSE2, just exit the app with a fatal error.

    #include 
    
    inline bool IsSSESupported()
    {
    #if defined(_M_IX86 ) || defined(_M_X64)
       return ( IsProcessorFeaturePresent( PF_XMMI_INSTRUCTIONS_AVAILABLE ) != 0 && IsProcessorFeaturePresent( PF_XMMI64_INSTRUCTIONS_AVAILABLE ) != 0 );
    #else
        return false;
    #endif
    }
    

    -or-

    #include 
    
    inline bool IsSSESupported()
    {
    #if defined(_M_IX86 ) || defined(_M_X64)
       int CPUInfo[4] = {-1};
       __cpuid( CPUInfo, 0 );
    
       if ( CPUInfo[0] < 1  )
           return false;
    
        __cpuid(CPUInfo, 1 );
    
        int edx = 0x4000000 // SSE2
                  | 0x2000000; // SSE
    
        if ( ( CPUInfo[3] & edx ) != edx )
            return false;
    
        return true;
    #else
        return false;
    #endif
    }
    

    Also, keep in mind that MMX, x87 FPU, and AMD 3DNow!* are all deprecated instruction sets for x64 native, so you shouldn't be using them actively anymore in newer code. A good rule of thumb is to avoid using any intrinsic that returns a __m64 or takes a __m64 data type.

    You may want to check out this DirectXMath blog series with notes on many of these instruction sets and the relevant processor support requirements.

    Note (*) - All the AMD 3DNow! instructions are deprecated except for PREFETCH and PREFETCHW which were carried forward. First generation Intel64 processors lacked support for these instructions, but they were later added as they are considered part of the core X64 instruction set. Windows 8.1 and Windows 10 x64 require PREFETCHW in particular, although the test is a little odd. Most Intel CPUs prior to Broadwell do not in fact report support for PREFETCHW through CPUID, but they treat the opcode as a no-op rather than throw an 'illegal instruction' exception. As such, the test here is (a) is it supported by CPUID, and (b) if not, does PREFETCHW at least not throw an exception.

    Here's some test code for Visual Studio that demonstrates the PREFETCHW test as well as many other CPUID bits for the x86 and x64 platforms.

    #include 
    #include 
    #include 
    #include 
    
    void main()
    {
       unsigned int x = _mm_getcsr();
       printf("%08X\n", x );
    
       bool prefetchw = false;
    
       // See http://msdn.microsoft.com/en-us/library/hskdteyh.aspx
       int CPUInfo[4] = {-1};
       __cpuid( CPUInfo, 0 );
    
       if ( CPUInfo[0] > 0 )
       {
           __cpuid(CPUInfo, 1 );
    
           // EAX
           {
               int stepping = (CPUInfo[0] & 0xf);
               int basemodel = (CPUInfo[0] >> 4) & 0xf;
               int basefamily = (CPUInfo[0] >> 8) & 0xf;
               int xmodel = (CPUInfo[0] >> 16) & 0xf;
               int xfamily = (CPUInfo[0] >> 20) & 0xff;
    
               int family = basefamily + xfamily;
               int model = (xmodel << 4) | basemodel;
    
               printf("Family %02X, Model %02X, Stepping %u\n", family, model, stepping );
           }
    
           // ECX
           if ( CPUInfo[2] & 0x20000000 ) // bit 29
              printf("F16C\n");
    
           if ( CPUInfo[2] & 0x10000000 ) // bit 28
              printf("AVX\n");
    
           if ( CPUInfo[2] & 0x8000000 ) // bit 27
              printf("OSXSAVE\n");
    
           if ( CPUInfo[2] & 0x400000 ) // bit 22
              printf("MOVBE\n");
    
           if ( CPUInfo[2] & 0x100000 ) // bit 20
              printf("SSE4.2\n");
    
           if ( CPUInfo[2] & 0x80000 ) // bit 19
              printf("SSE4.1\n");
    
           if ( CPUInfo[2] & 0x2000 ) // bit 13
              printf("CMPXCHANG16B\n");
    
           if ( CPUInfo[2] & 0x1000 ) // bit 12
              printf("FMA3\n");
    
           if ( CPUInfo[2] & 0x200 ) // bit 9
              printf("SSSE3\n");
    
           if ( CPUInfo[2] & 0x1 ) // bit 0
              printf("SSE3\n");
    
           // EDX
           if ( CPUInfo[3] & 0x4000000 ) // bit 26
               printf("SSE2\n");
    
           if ( CPUInfo[3] & 0x2000000 ) // bit 25
               printf("SSE\n");
    
           if ( CPUInfo[3] & 0x800000 ) // bit 23
               printf("MMX\n");
       }
       else
           printf("CPU doesn't support Feature Identifiers\n");
    
       if ( CPUInfo[0] >= 7 )
       {
           __cpuidex(CPUInfo, 7, 0);
    
           // EBX
           if ( CPUInfo[1] & 0x100 ) // bit 8
             printf("BMI2\n");
    
           if ( CPUInfo[1] & 0x20 ) // bit 5
             printf("AVX2\n");
    
           if ( CPUInfo[1] & 0x8 ) // bit 3
             printf("BMI\n");
       }
       else
           printf("CPU doesn't support Structured Extended Feature Flags\n");
    
       // Extended features
       __cpuid( CPUInfo, 0x80000000 );
    
       if ( CPUInfo[0] > 0x80000000 )
       {
           __cpuid(CPUInfo, 0x80000001 );
    
           // ECX
           if ( CPUInfo[2] & 0x10000 ) // bit 16
               printf("FMA4\n");
    
           if ( CPUInfo[2] & 0x800 ) // bit 11
               printf("XOP\n");
    
           if ( CPUInfo[2] & 0x100 ) // bit 8
           {
               printf("PREFETCHW\n");
               prefetchw = true;
           }
    
           if ( CPUInfo[2] & 0x80 ) // bit 7
               printf("Misalign SSE\n");
    
           if ( CPUInfo[2] & 0x40 ) // bit 6
               printf("SSE4A\n");
    
           if ( CPUInfo[2] & 0x1 ) // bit 0
               printf("LAHF/SAHF\n");
    
           // EDX
           if ( CPUInfo[3] & 0x80000000 ) // bit 31
               printf("3DNow!\n");
    
           if ( CPUInfo[3] & 0x40000000 ) // bit 30
               printf("3DNowExt!\n");
    
           if ( CPUInfo[3] & 0x20000000 ) // bit 29
               printf("x64\n");
    
           if ( CPUInfo[3] & 0x100000 ) // bit 20
               printf("NX\n");
       }
       else
           printf("CPU doesn't support Extended Feature Identifiers\n");
    
       if ( !prefetchw )
       {
           bool illegal = false;
    
           __try
           {
               static const unsigned int s_data = 0xabcd0123;
    
               _m_prefetchw(&s_data);
           }
           __except (EXCEPTION_EXECUTE_HANDLER)
           {
               illegal = true;
           }
    
           if (illegal)
           {
               printf("PREFETCHW is an invalid instruction on this processor\n");
           }
       }
    }
    

    UPDATE: The fundamental challenge, of course, is how do you handle systems that lack support for AVX? While the instruction set is useful, the biggest benefit of having an AVX-capable processor is the ability to use the /arch:AVX build switch which enables the global use of the VEX prefix for better SSE/SSE2 code-gen. The only problem is the resulting code DLL/EXE is not compatible with systems that lack AVX support.

    As such, for Windows, ideally you should build one EXE for non-AVX systems (assuming SSE/SSE2 only so use /arch:SSE2 instead for x86 code; this setting is implicit for x64 code), a different EXE that is optimized for AVX (using /arch:AVX), and then use CPU detection to determine which EXE to use for a given system.

    Luckily with Xbox One, we can just always build with /arch::AVX since it's a fixed platform...

    UPDATE 2: For clang/LLVM, you should use slight dikyfferent intriniscs for CPUID:

    if defined(__clang__) || defined(__GNUC__)
        __cpuid(1, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
    #else
        __cpuid(CPUInfo, 1);
    #endif
    
    if defined(__clang__) || defined(__GNUC__)
        __cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
    #else
        __cpuidex(CPUInfo, 7, 0);
    #endif
    

提交回复
热议问题