This question made me question a practice I had been following for years.
For thread-safe initialization of function-local static const objects I protect t
In brief, I think that:
The object initialization is thread-safe, assuming that "some_mutex" is fully constructed when entering "create_const_thingy".
The initialization of the object reference inside "use_const_thingy" is not guaranteed to be thread-safe; it might (as you say) be subject of getting initialized multiple times (which is less of a problem), but it might also be subject to word tearing which could result in undefined behaviour.
[I assume that the C++ reference is implemented as a reference to the actual object using a pointer value, which could in theory be read when partially written to].
So, to try and answer your question:
Safe enough in practice: Very likely, but ultimately depends on pointer size, processor architecture and code generated by the compiler. The crux here is likely to be whether a pointer-sized write/read is atomic or not.
Safe according to the rule: Well, there are no such rules in C++98, sorry (but you knew that already).
Update: After posting this answer I realized that it only focuses on a small, esoteric part of the real problem, and because of this decided to post another answer instead of editing the contents. I'm leaving the contents "as-is" as it has some relevance to the question (and also to humble myself, reminding me to think through things a bit more before answering).
I am not standardista...
But for the use you mention, why not simply initialize them before any thread is created ? Many Singletons issues are caused because people use the idiomatic "single thread" lazy initialization while they could simply instantiate the value when the library is loaded (like a typical global).
The lazy fashion only makes sense if you use this value from another 'global'.
On the other hand, another method I've seen was to use some kind of coordination:
though I may not be describing it accurately.
So, the relevant part of the spec is 6.7/4:
An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization.
Assuming the second part holds (object is initialized the first time control passes through its declaration
), your code can be considered thread safe.
Reading through 3.6.2, it appears the early initialization permitted is converting dynamic-initialization to static-initialization. Since static-initialization must happen before any dynamic-initialization and since I can't think of any way to create a thread until you get to dynamic-initialization, such an early initialization would also guarantee the constructor would get called a single time.
Update
So, in respect to calling the some_type
constructor for the_const_thingy
, your code is correct according to the rules.
This leaves the issue about overwriting the reference which is definitely not covered by the spec. That said, if you are willing to assume that references are implemented via pointers (which I believe is the most common way to do that), then all you are going to do is overwrite a pointer with the value that it already holds. So my take is that this should be safe in practice.
I've programmed enough interprocess sockets to have nightmares. In order to make anything thread-safe on a CPU with DDR RAM, you have to cache-line-align the data structure and pack up all of your global variables contiguously into as few cache lines as possible.
The problem with unaligned interprocess data and loosely packed globals are that it causes aliasing from cache misses. In CPUs that use DDR RAM, there are a (usually) a bunch of 64-byte cache lines. When you load a cache line, the DDR RAM will automatically load up a bunch more cache lines, but the first cache line is always the hottest. What happens with interrupts that occur at high speeds is that the cache page will act as a low-pass filter, just like in analog signals, and will filter out the interrupt data leading to COMPLETELY baffling bugs if you're not aware of whats going on. That same thing goes for global variables that are not packed up tightly; if it takes up multiple cache lines it will get out of sync unless you to take a snapshot of the critical interprocess variables and pass them through on the stack and the registers to ensure the data is synced up right.
The .bss section (i.e. where the global variables are stored, will get initialized to all zeros, but the compiler will not cache-line-align the data for you, you will have to do that yourself, which may also a good place to use the C++ Construct in Place. To learn the math behind fastest way to align pointers read this article; I'm trying to figure out if I came up with that trick. Here is what the code will look like:
inline char* AlignCacheLine (char* buffer) {
uintptr_t offset = ((~reinterpret_cast<uintptr_t> (buffer)) + 1) & (63);
return buffer + offset;
}
char SomeTypeInit (char* buffer, int param_1, int param_2, int param_3) {
SomeType type = SomeType<AlignCacheLine (buffer)> (1, 2, 3);
return 0xff;
}
const SomeType* create_const_thingy () {
static char interprocess_socket[sizeof (SomeType) + 63],
dead_byte = SomeTypeInit (interprocess_socket, 1, 2, 3);
return reinterpret_cast<SomeType*> (AlignCacheLine (interprocess_socket));
}
In my experience, you will have to use a pointer, not a reference.
This seems to be the easiest/cleanest approach I can think of without needing all of the mutex shananigans:
static My_object My_object_instance()
{
static My_object object;
return object;
}
// Ensures that the instance is created before main starts and creates any threads
// thereby guaranteeing serialization of static instance creation.
__attribute__((constructor))
void construct_my_object()
{
My_object_instance();
}
This is my second attempt at an answer. I'll only answer the first of your questions:
- safe enough in practice?
No. As you're stating yourself you're only ensuring that the object creation is protected, not the initialization of the reference to the object.
In absence of a C++98 memory model and no explicit statements from the compiler vendor, there are no guarantees that writing to the memory representing the actual reference and the writing to the memory that holds the value of the initialization flag (if that is how it is implemented) for the reference are seen in the same order from multiple threads.
As you also say, overwriting the reference several times with the same value should make no semantic difference (even in the presence of word tearing, which is generally unlikely and perhaps even impossible on your processor architecture) but there's one case where it matters: When more than one thread races to call the function for the first time during program execution. In this case it is possible for one or more of these threads to see the initialization flag being set before the actual reference is initialized.
You have a latent bug in your program and you need to fix it. As for optimizations I'm sure there are many besides using the double-checked locking pattern.