I'd recommend OpenMP. Microsoft have put it into the Visual C++ 2005 compiler so its well supported, and you don't need to do anything other than compile with the /omp directive.
Its simple to use, though obviously it doesn't do everything for you, but then nothing does. I use it for running parallel for loops generally without any hassle, for more complex things I tend to roll my own (eg I have code from ages ago I cut, paste and modify).
You could try Cilk++ which looks good, and has an e-book "How to Survive the Multicore Software Revolution".
Both these kinds of system try to parallelize serial code - ie take a for loop a run it on all the cores simultaneously in as easy a way possible. They don't tend to be general-purpose thread libraries. (eg a research paper(pdf) described performance of different types of thread pools implemented in openMP and suggested 2 new operations should be added to it - yield and sleep. I think they're missing the point of OpenMP a little there)
As you mentioned OpenMP, I assume you're talking about native c++, not C# or .NET.
Also, if the HPC people (who I assume are experts in this kind of domain) seem to be using OpenMP or MPI, then this is what you should be using, not what the readership of SO is!