I have to write a not-so-large program in C++, using boost::thread.
The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon
There are a lot of variables that will effect performance (OS, filesystem, hard drive speed vs CPU speed, data access patterns, how much processing is done on the data after it is read, etc).
So your best bet is to simply try a test run for every possible thread count, on a representative data set (a big one if possible, so that filesystem caching won't skew the results too badly), and record how long it takes each time. Start with a single thread, then try it again with two threads, and so on until you feel you have enough data points. At the end you should have data that graphs into a nice curve that indicates where the "sweet spot" is. You should be able to do this in a loop so that the results are compiled automatically overnight.