I need some idea how to write a C++ cross platform implementation of a few parallelizable problems in a way so I can take advantage of SIMD (SSE, SPU, etc) if available. As well
The most impressive approach to SIMD-scaling I've seen is the RTFact ray-tracing framework: slides, paper. Well worth a look. The researchers are closely associated with Intel (Saarbrucken now hosts the Intel Visual Computing Institute) so you can be sure forward scaling onto AVX and Larrabee was on their minds.
Intel's Ct "data parallelism" template library looks quite promising too.