Let\'s say I have a 4-core CPU, and I want to run some process in the minimum amount of time. The process is ideally parallelizable, so I can run chunks of it on an infinite
You will find how many threads you can run on your machine by running htop or ps command that returns number of process on your machine.
You can use man page about 'ps' command.
man ps
If you want to calculate number of all users process, you can use one of these commands:
ps -aux| wc -l
ps -eLf | wc -l
Calculating number of an user process:
ps --User root | wc -l
Also, you can use "htop" [Reference]:
Installing on Ubuntu or Debian:
sudo apt-get install htop
Installing on Redhat or CentOS:
yum install htop
dnf install htop [On Fedora 22+ releases]
If you want to compile htop from source code, you will find it here.
One example of lots of threads ("thread pool") vs one per core is that of implementing a web-server in Linux or in Windows.
Since sockets are polled in Linux a lot of threads may increase the likelihood of one of them polling the right socket at the right time - but the overall processing cost will be very high.
In Windows the server will be implemented using I/O Completion Ports - IOCPs - which will make the application event driven: if an I/O completes the OS launches a stand-by thread to process it. When the processing has completed (usually with another I/O operation as in a request-response pair) the thread returns to the IOCP port (queue) to wait for the next completion.
If no I/O has completed there is no processing to be done and no thread is launched.
Indeed, Microsoft recommends no more than one thread per core in IOCP implementations. Any I/O may be attached to the IOCP mechanism. IOCs may also be posted by the application, if necessary.
The answer depends on the complexity of the algorithms used in the program. I came up with a method to calculate the optimal number of threads by making two measurements of processing times Tn and Tm for two arbitrary number of threads ‘n’ and ‘m’. For linear algorithms, the optimal number of threads will be N = sqrt ( (mn(Tm*(n-1) – Tn*(m-1)))/(nTn-mTm) ) .
Please read my article regarding calculations of the optimal number for various algorithms: pavelkazenin.wordpress.com
4000 threads at one time is pretty high.
The answer is yes and no. If you are doing a lot of blocking I/O in each thread, then yes, you could show significant speedups doing up to probably 3 or 4 threads per logical core.
If you are not doing a lot of blocking things however, then the extra overhead with threading will just make it slower. So use a profiler and see where the bottlenecks are in each possibly parallel piece. If you are doing heavy computations, then more than 1 thread per CPU won't help. If you are doing a lot of memory transfer, it won't help either. If you are doing a lot of I/O though such as for disk access or internet access, then yes multiple threads will help up to a certain extent, or at the least make the application more responsive.
Benchmark.
I'd start ramping up the number of threads for an application, starting at 1, and then go to something like 100, run three-five trials for each number of threads, and build yourself a graph of operation speed vs. number of threads.
You should that the four thread case is optimal, with slight rises in runtime after that, but maybe not. It may be that your application is bandwidth limited, ie, the dataset you're loading into memory is huge, you're getting lots of cache misses, etc, such that 2 threads are optimal.
You can't know until you test.
I know this question is rather old, but things have evolved since 2009.
There are two things to take into account now: the number of cores, and the number of threads that can run within each core.
With Intel processors, the number of threads is defined by the Hyperthreading which is just 2 (when available). But Hyperthreading cuts your execution time by two, even when not using 2 threads! (i.e. 1 pipeline shared between two processes -- this is good when you have more processes, not so good otherwise. More cores are definitively better!)
On other processors you may have 2, 4, or even 8 threads. So if you have 8 cores each of which support 8 threads, you could have 64 processes running in parallel without context switching.
"No context switching" is obviously not true if you run with a standard operating system which will do context switching for all sorts of other things out of your control. But that's the main idea. Some OSes let you allocate processors so only your application has access/usage of said processor!
From my own experience, if you have a lot of I/O, multiple threads is good. If you have very heavy memory intensive work (read source 1, read source 2, fast computation, write) then having more threads doesn't help. Again, this depends on how much data you read/write simultaneously (i.e. if you use SSE 4.2 and read 256 bits values, that stops all threads in their step... in other words, 1 thread is probably a lot easier to implement and probably nearly as speedy if not actually faster. This will depend on your process & memory architecture, some advanced servers manage separate memory ranges for separate cores so separate threads will be faster assuming your data is properly filed... which is why, on some architectures, 4 processes will run faster than 1 process with 4 threads.)