Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

问题

Suppose that we have the following bit of code:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

void guarantee(bool cond, const char *msg) {
    if (!cond) {
        fprintf(stderr, "%s", msg);
        exit(1);
    }
}

bool do_shutdown = false;   // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;

/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    while (!do_shutdown) {   // while loop guards against spurious wakeups
        res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
        guarantee(res == 0, "Could not wait for shutdown cond");
    }

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

/* Called in Thread 2. */
void trigger_shutdown() {

    int res;

    res = pthread_mutex_lock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not lock shutdown cond mutex");

    do_shutdown = true;

    res = pthread_cond_signal(&shutdown_cond);
    guarantee(res == 0, "Could not signal shutdown cond");

    res = pthread_mutex_unlock(&shutdown_cond_mutex);
    guarantee(res == 0, "Could not unlock shutdown cond mutex");
}

Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?

The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.

In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?

Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It's clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

回答1:

Of course the current C and C++ standards say nothing on the subject.

As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy - it does not precisely lay out the requirements in this area, but implementers are expected to "know what it means" and do something that makes threads usable.

When the standard says that mutexes "synchronize memory access", implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it's necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.

Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what "usable" means, in terms of allowing the programmer to "reason convincingly about program correctness".

Note that Posix doesn't guarantee a coherent memory cache, so if your implementation perversely wants to cache do_something in a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU's local cache between the synchronizing operation and reading do_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.

That's (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to "know about" threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants volatile in multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.

If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

Yes to all of this - if the object is non-volatile, and the compiler can prove that this thread doesn't modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.

[*] provided that the implementation knows the global is not located at some "special" hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.

回答2:

Since do_shutdown has external linkage there's no way the compiler could know what happens to it across the call (unless it had full visibility to the functions being called). So it would have to reload the value (volatile or not - threading has no bearing on this) after the call.

As far as I know there's nothing directly said about this in the standard, except that the (single-threaded) abstract machine the standard uses to define the behavior of expressions indicates that the variable needs to be read when it's accessed in an expression. The standard permits that reading of the variable to be optimized away only if the behavior can be proven to be "as if" it were reloaded. And that can happen only if the compiler can know that the value was not modified by the function call.

Also not that the pthread library does make certain guarantees about memory barriers for various functions, including pthread_cond_wait(): Does guarding a variable with a pthread mutex guarantee it's also not cached?

Now, if do_shutdown were static (no external linkage) and you have several threads that used that static variable defined in the same module (ie., the address of the static variable was never taken to be passed to another module), That might be a different story. for example, say that you have a single function that used such a variable, and started several thread instances running for that function. In that case, a standards conforming compiler implementation might cache the value across function calls since it could assume that nothing else could modify the value (the standard's abstract machine model doesn't include threading).

So in that case, you would have to use mechanisms to ensure that the value was reloaded across the call. Note that because of hardware intricacies, the volatile keyword might not be adequate to ensure correct memory access ordering - you should rely on APIs provided by pthreads or the OS to ensure that. (as a side-note, recent versions of Microsoft's compilers do document that volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard).

回答3:

The hand-waving answers are all wrong. Sorry to be harsh.

There is no way

The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown.

If you believe differently, please show proof: a complete C++ program such that a compiler not designed for MT could deduce that pthread_cond_wait does not modify do_shutdown.

It's absurd, a compiler cannot possibly understand what pthread_ functions do, unless it has built-in knowledge of POSIX threads.

回答4:

From my own work, I can say that yes, the compiler can cache values across pthread_mutex_lock/pthread_mutex_unlock. I spent most of a weekend tracing down a bug in a bit of code that was caused by a set of pointers assignments being cached and unavailable to the threads that needed them. As a quick test, I wrapped the assignments in a mutex lock/unlock, and the threads still did not have access to the proper pointer values. Moving the pointer assignments & associated mutex locking to a separate function did fix the problem.

来源：https://stackoverflow.com/questions/4476446/can-a-c-c-compiler-legally-cache-a-variable-in-a-register-across-a-pthread-lib

标签

c++

concurrency

standards

sequence-points