Is there any good empirical data on the cost of context switching between threads of the same process on Linux (x86 and x86_64, mainly, are of interest)? I\'m talking about
(Disclaimer: This isn't a direct answer to the question, it's just some suggestions that I hope will be helpful).
Firstly, the numbers you're getting certainly sound like they're within the ballpark. Note, however, that the interrupt / trap latency can vary a lot among different CPU models implementing the same ISA. It's also a different story if your threads have used floating point or vector operations, because if they haven't the kernel avoids saving/restoring the floating point or vector unit state.
You should be able to get more accurate numbers by using the kernel tracing infrastructure - perf sched in particular is designed to measure and analyse scheduler latency.
If your goal is to model thread-per-connection servers, then you probably shouldn't be measuring involuntary context switch latency - usually in such a server, the majority of context switches will be voluntary, as a thread blocks in read()
waiting for more data from the network. Therefore, a better testbed might involve measuring the latency from one thread blocking in a read()
to another being woken up from the same.
Note that in a well-written multiplexing server under heavy load, the transition from fd X
to fd Y
will often involve the same single system call (as the server iterates over a list of active file descriptors returned from a single epoll()
). One thread also ought to have less cache footprint than multiple threads, simply through having only one stack. I suspect the only way to settle the matter (for some definition of "settle") might be to have a benchmark shootout...