I am trying to implement a vector math library that can generate SIMD optimized versions of a function based on the feature set supported by the processor. Currently I have