I have to write a not-so-large program in C++, using boost::thread.
The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon
To elaborate it really depends on
IO boundedness of the problem
how big are the files
how contiguous are the files
in what order must they be processed
can you determine the disk placement
how much concurrency you can get in the "global structure insert"
can you "silo" the data structure with a consolidation wrapper
the actual CPU cost of the "global structure insert"
For example if your files reside on a 3 terabyte flash memory array then the solution is different than if they reside on a single disk (where if the "global structure insert" takes less that the read the problem is I/O bounded and you might just as well have a 2 stage pipe with 2 threads - the read stage feeding the insert stage.)
But in both cases the architecture would probably be a vertical pipeline of 2 stages. n reading threads and m writing threads with n and m being determined by a "natural concurrency" for the stage.
Creating a thread per file will probably lead to disk thrashing. Just like you tailor the number of threads of a CPU bound process to the naturally achievable CPU concurrency (and going above that creates context switching overhead AKA thrashing) the same is true on the I/O side - in a sense you can think of the disk thrashing as "context switching on the disk".