I have to write a not-so-large program in C++, using boost::thread.
The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon
There are two problems here, the first is your question about the ideal number ofthreads to use for processing this large number of files, the second is how to acheive the best performance.
Let's start with the second problem, to begin with I would not parallelize per file but I would parallelize the processing done on one file at a time. This would help significantly on multiple parts of your environment: - The hard drive as it does not have to seek out from one file to the n - 1 others - The operating system file cache will be warm with the data you will need on all your threads and you will not experience as much cache trashing.
I admit that the code to parallelize your application gets slightly more complex but the benefits you'll obtain are significant.
From this the answer to your question is easy, you should match at most one thread per core present in your system. This will allow you to be respectful of your caches and ultimately achieve the best performance possible on your system.
The ultimate point of course is that using this type of processing your application will be more respectful of your system as accessing n files simultaneously may make your OS unresponsive.