Too late answer, but:
I would seriously consider Intel TBB. One thing that I noted missing from C++ standard parallel mode is parallel containers. TBB containers do not follow the interface of C++ standard containers, but they provide justifications for this. Besides, TBB has a number of examples and design patterns.