Now that C++ is adding thread_local
storage as a language feature, I\'m wondering a few things:
thead_local
likely to
Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads.
There are two basic ways of implementing thread-local storage:
Using some sort of system call that gets information about the current kernel thread. Sloooow.
Using some pointer, probably in a processor register, that is set properly at every thread context switch by the kernel - at the same time as all the other registers. Cheap.
On intel platforms, variant 2 is usually implemented via some segment register (FS or GS, I don't remember). Both GCC and MSVC support this. Access times are therefore about as fast as for global variables.
It is also possible, but I haven't seen it yet in practice, for this to be implemented via existing library functions like pthread_getspecific
. Performance would then be like 1. or 2., plus library call overhead. Keep in mind that variant 2. + library call overhead is still a lot faster than a kernel call.