I have to write a not-so-large program in C++, using boost::thread.
The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon
Not to sound trite but you use as many threads as you need.
Basically you can draw a graph of the number of threads against the (real) time to completion. You can also draw one that is total threads to total thread time.
The first graph in particular will help you identify where the bottleneck in CPU power lies. At some point you will become either I/O bound (meaning the disk can't load the data fast enough) or the number of threads will become so large it will impact performance of the machine.
The second does happen. I saw one piece of code that ended up creating 30,000+ threads. It ended up being quicker by capping it to 1,000.
The other way to look at this is: how fast is fast enough? The point where I/O becomes a bottleneck is one thing but you may hit a point before that where it's "fast enough".