In Goetz\'s \"Java Concurrency in Practice\", in a footnote on page 101, he writes \"For computational problems like this that do not I/O and access no shared data, Ncpu or
Ncpu + expected # of concurrent IO activities is my usual number.
The key isn't that 20 threads can write a single file to disk faster than 4 threads. If you only have 1 thread per cpu, then while you are writing to disk your process will not be able to use the cpu hosting the thread that is doing the file IO. That CPU is effectively waiting for the file to be written, whereas if you have one more thread it can use the CPU to do real processing in the interim.