I\'m currently develop an open source 3D application framework in c++ (with c++11). My own math library is designed like the XNA math library, also with SIMD in mind. But curren
I suggest that you learn about expression templates (custom operator implementations that use proxy objects). In this way, you can avoid doing performance-killing load/store around each individual operation, and do them only once for the entire computation.