Theres are couple of places in my code base where the same operation is repeated a very large number of times for a large data set. In some cases it\'s taking a considerable tim
For your second point there are several solutions as long as you can separate out the differences into different functions:
Note that because you'd be relying on indirect function calls, the functions that abstract the different operations generally need to represent somewhat higher level functionality or you may lose whatever gains you get from the optimized instruction in the call overhead (in other words don't abstract the individual SSE operations - abstract the work you're doing).
Here's an example using function pointers:
typedef int (*scale_func_ptr)( int scalar, int* pData, int count);
int non_sse_scale( int scalar, int* pData, int count)
{
// do whatever work needs done, without SSE so it'll work on older CPUs
return 0;
}
int sse_scale( int scalar, in pData, int count)
{
// equivalent code, but uses SSE
return 0;
}
// at initialization
scale_func_ptr scale_func = non_sse_scale;
if (useSSE) {
scale_func = sse_scale;
}
// now, when you want to do the work:
scale_func( 12, theData_ptr, 512); // this will call the routine that tailored to SSE
// if the CPU supports it, otherwise calls the non-SSE
// version of the function