Ulrich Drepper\'s paper on thread-local storage outlines the TLS ABI for several different cpu architectures, but I\'m finding it insufficient as a basis for implementing TL
The best I can gather so far is:
For either TLS variant, __tls_get_addr
or other arch-specific functions must exist and have the correct semantics for looking up any TLS object, and the relative offset between any two TLS segments must be a runtime constant (same offset for each thread).
For TLS variant II (i386, etc.), the "thread pointer register" (which may not actually be a register, but perhaps some mechanism like %gs:0
or even a trap into kernelspace; for simplicity though let's just call it a register) points just past the end of the TLS segment for the main executable, where "just past the end" includes rounding up to the next multiple of the TLS segment's alignment.
For TLS variant I, the "thread pointer register" points to some fixed offset from the beginning of the TLS segment for the main executable. This offset varies by arch. (It has been chosen on some ugly RISC archs to maximize the amount of TLS accessible via signed 16-bit offsets, which strikes me as extremely useless since the compiler has no way of knowing whether the relocated offset will fit in 16 bits and thus must always generate the slower, larger 32-bit-offset code using load-upper/add instructions).
As far as I can tell, nothing about TCBs, DTVs, etc. is part of the ABI, in the sense that applications are not permitted to access these structures, nor is the location of any TLS segment other than the main executable's part of the ABI. In both variants I and II, it makes sense to store implementation-internal information for the thread at a fixed offset from the "thread pointer register", in whichever way safely avoids overlapping the TLS segment.