Taking advantage of SSE and other CPU extensions

后端 未结 5 481
南笙
南笙 2021-02-04 06:41

Theres are couple of places in my code base where the same operation is repeated a very large number of times for a large data set. In some cases it\'s taking a considerable tim

5条回答
  •  忘了有多久
    2021-02-04 07:12

    For your second point there are several solutions as long as you can separate out the differences into different functions:

    • plain old C function pointers
    • dynamic linking (which generally relies on C function pointers)
    • if you're using C++, having different classes that represent the support for different architectures and using virtual functions can help immensely with this.

    Note that because you'd be relying on indirect function calls, the functions that abstract the different operations generally need to represent somewhat higher level functionality or you may lose whatever gains you get from the optimized instruction in the call overhead (in other words don't abstract the individual SSE operations - abstract the work you're doing).

    Here's an example using function pointers:

    typedef int (*scale_func_ptr)( int scalar, int* pData, int count);
    
    
    int non_sse_scale( int scalar, int* pData, int count)
    {
        // do whatever work needs done, without SSE so it'll work on older CPUs
    
        return 0;
    }
    
    int sse_scale( int scalar, in pData, int count)
    {
        // equivalent code, but uses SSE
    
        return 0;
    }
    
    
    // at initialization
    
    scale_func_ptr scale_func = non_sse_scale;
    
    if (useSSE) {
        scale_func = sse_scale;
    }
    
    
    // now, when you want to do the work:
    
    scale_func( 12, theData_ptr, 512);  // this will call the routine that tailored to SSE 
                                        // if the CPU supports it, otherwise calls the non-SSE
                                        // version of the function
    

提交回复
热议问题