I have to write a not-so-large program in C++, using boost::thread.
The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon
The answer depends somewhat on how CPU intensive the processing you need to perform on each file is.
At one extreme where the processing time dominates the I/O time, the benefit that threading gives you is just the ability to take advantage of multiple cores (and possibly hyperthreading) to make use of the maximum available processing power of your CPU. In this case you'd want to aim for a number of worker threads roughly equal to the number of logical cores on the system.
At the other extreme where I/O is your bottleneck you aren't going to see all that much benefit from multiple threads since they will spend most of their time sleeping waiting for I/O to complete. In that case you'd want to focus on maximizing your I/O throughput rather than your CPU utilization. On a single unfragmented hard drive or a DVD where you were I/O bound having multiple threads would likely hurt performance since you'd get maximum I/O throughput from sequential reads on a single thread. If the drive is fragmented or you have a RAID array or similar then having multiple I/O requests in flight simultaneously might boost your I/O throughput since the controller may be able to intelligently rearrange them to make more efficient reads.
I think it might be helpful to view this as really two separate problems. One is how to get maximum I/O throughput for your file reads, the other is how to make maximum use of your CPU for processing the files. You would probably get optimal throughput by having a small number of I/O threads kicking off I/O requests and a pool of worker threads roughly equal to the number of logical CPU cores processing the data as it becomes available. Whether it is worth the effort to implement a more complex setup like that depends on where the bottlenecks are in your particular problem though.