Now that C++ is adding thread_local
storage as a language feature, I\'m wondering a few things:
thead_local
likely to
A description for how it works on Linux by Uli Drepper (maintainer of glibc) can be found here: www.akkadia.org/drepper/tls.pdf
The requirement to handle dynamically loaded modules etc. make the entire mechanism a bit convoluted, which perhaps partly explains why the document weights in at 79 pages (!).
Memory-usage-wise, each per-thread variable obviously needs it's own per-thread memory (although in some cases this can be done lazily such that the space is allocated only once the variable is first accessed), and then there's some extra datastructures that are needed for offset tables etc.
Performance-wise, the extra cost to access a TLS variable mostly revolves around retrieving the address of the variable. On x86 Linux the GS register is used as a start to get a thread id, on x86-64 FS. Usually there is a few pointer dereferences, and a function call (__tls_get_addr) for dynamically loaded code. There's also the cost that creating a new thread is slower because the implementation needs to allocate space and possibly initialize all the TLS vars (if not done lazily).
TLS is nice for easily making some old thread-unsafe code patterns thread-safe (think errno), but for new code designed from the start for a multi-threaded world it's very seldom needed.