I need some idea how to write a C++ cross platform implementation of a few parallelizable problems in a way so I can take advantage of SIMD (SSE, SPU, etc) if available. As well
Have you thought about using existing solutions like liboil? It implements lots of common SIMD operations and can decide at runtime whether to use SIMD/non-SIMD code (using function pointers assigned by an initialization function).