I have to write a not-so-large program in C++, using boost::thread.
The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon
I'm not too sure about HP/UX, but in the Windows world, we use thread pools to solve this sort of problem. Raymond Chen wrote about this a while back, in fact...
The skinny of it is that I would generally not expect anything to scale well on a CPU-bound load if the number of threads is more than about 2x the number of CPU cores you have in the system. For I/O bound loads, you might be able to get away with more, depending on how fast your disk subsystem is, but once you reach about 100 or so, I would seriously consider changing the model...