Does the C++ volatile keyword introduce a memory fence?

前端 未结 13 2019
一整个雨季
一整个雨季 2020-11-28 20:40

I understand that volatile informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduc

相关标签:
13条回答
  • 2020-11-28 21:10

    The compiler needs to introduce a memory fence around volatile accesses if, and only if, that is necessary to make the uses for volatile specified in the standard work (setjmp, signal handlers, and so on) on that particular platform.

    Note that some compilers do go way beyond what's required by the C++ standard in order to make volatile more powerful or useful on those platforms. Portable code shouldn't rely on volatile to do anything beyond what's specified in the C++ standard.

    0 讨论(0)
  • 2020-11-28 21:14

    The compiler only inserts a memory fence on the Itanium architecture, as far as I know.

    The volatile keyword is really best used for asynchronous changes, e.g., signal handlers and memory-mapped registers; it is usually the wrong tool to use for multithreaded programming.

    0 讨论(0)
  • 2020-11-28 21:16

    This is largely from memory, and based on pre-C++11, without threads. But having participated in discussions on threading in the committe, I can say that there was never an intent by the committee that volatile could be used for synchronization between threads. Microsoft proposed it, but the proposal didn't carry.

    The key specification of volatile is that access to a volatile represents an "observable behavior", just like IO. In the same way the compiler cannot reorder or remove specific IO, it cannot reorder or remove accesses to a volatile object (or more correctly, accesses through an lvalue expression with volatile qualified type). The original intent of volatile was, in fact, to support memory mapped IO. The "problem" with this, however, is that it is implementation defined what constitutes a "volatile access". And many compilers implement it as if the definition was "an instruction which reads or writes to memory has been executed". Which is a legal, albeit useless definition, if the implementation specifies it. (I've yet to find the actual specification for any compiler.)

    Arguably (and it's an argument I accept), this violates the intent of the standard, since unless the hardware recognizes the addresses as memory mapped IO, and inhibits any reordering, etc., you can't even use volatile for memory mapped IO, at least on Sparc or Intel architectures. Never the less, none of the comilers I've looked at (Sun CC, g++ and MSC) do output any fence or membar instructions. (About the time Microsoft proposed extending the rules for volatile, I think some of their compilers implemented their proposal, and did emit fence instructions for volatile accesses. I've not verified what recent compilers do, but it wouldn't surprise me if it depended on some compiler option. The version I checkd—I think it was VS6.0—didn't emit fences, however.)

    0 讨论(0)
  • 2020-11-28 21:16

    While I was working through an online downloadable video tutorial for 3D Graphics & Game Engine development working with modern OpenGL. We did use volatile within one of our classes. The tutorial website can be found here and the video working with the volatile keyword is found in the Shader Engine series video 98. These works are not of my own but are accredited to Marek A. Krzeminski, MASc and this is an excerpt from the video download page.

    "Since we can now have our games run in multiple threads it is important to synchronize data between threads properly. In this video I show how to create a volitile locking class to ensure volitile variables are properly synchronized..."

    And if you are subscribed to his website and have access to his video's within this video he references this article concerning the use of Volatile with multithreading programming.

    Here is the article from the link above: http://www.drdobbs.com/cpp/volatile-the-multithreaded-programmers-b/184403766

    volatile: The Multithreaded Programmer's Best Friend

    By Andrei Alexandrescu, February 01, 2001

    The volatile keyword was devised to prevent compiler optimizations that might render code incorrect in the presence of certain asynchronous events.

    I don't want to spoil your mood, but this column addresses the dreaded topic of multithreaded programming. If — as the previous installment of Generic says — exception-safe programming is hard, it's child's play compared to multithreaded programming.

    Programs using multiple threads are notoriously hard to write, prove correct, debug, maintain, and tame in general. Incorrect multithreaded programs might run for years without a glitch, only to unexpectedly run amok because some critical timing condition has been met.

    Needless to say, a programmer writing multithreaded code needs all the help she can get. This column focuses on race conditions — a common source of trouble in multithreaded programs — and provides you with insights and tools on how to avoid them and, amazingly enough, have the compiler work hard at helping you with that.

    Just a Little Keyword

    Although both C and C++ Standards are conspicuously silent when it comes to threads, they do make a little concession to multithreading, in the form of the volatile keyword.

    Just like its better-known counterpart const, volatile is a type modifier. It's intended to be used in conjunction with variables that are accessed and modified in different threads. Basically, without volatile, either writing multithreaded programs becomes impossible, or the compiler wastes vast optimization opportunities. An explanation is in order.

    Consider the following code:

    class Gadget {
    public:
        void Wait() {
            while (!flag_) {
                Sleep(1000); // sleeps for 1000 milliseconds
            }
        }
        void Wakeup() {
            flag_ = true;
        }
        ...
    private:
        bool flag_;
    };
    

    The purpose of Gadget::Wait above is to check the flag_ member variable every second and return when that variable has been set to true by another thread. At least that's what its programmer intended, but, alas, Wait is incorrect.

    Suppose the compiler figures out that Sleep(1000) is a call into an external library that cannot possibly modify the member variable flag_. Then the compiler concludes that it can cache flag_ in a register and use that register instead of accessing the slower on-board memory. This is an excellent optimization for single-threaded code, but in this case, it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever. This is because the change of flag_ will not be reflected in the register that caches flag_. The optimization is too ... optimistic.

    Caching variables in registers is a very valuable optimization that applies most of the time, so it would be a pity to waste it. C and C++ give you the chance to explicitly disable such caching. If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable. So all you have to do to make Gadget's Wait/Wakeup combo work is to qualify flag_ appropriately:

    class Gadget {
    public:
        ... as above ...
    private:
        volatile bool flag_;
    };
    

    Most explanations of the rationale and usage of volatile stop here and advise you to volatile-qualify the primitive types that you use in multiple threads. However, there is much more you can do with volatile, because it is part of C++'s wonderful type system.

    Using volatile with User-Defined Types

    You can volatile-qualify not only primitive types, but also user-defined types. In that case, volatile modifies the type in a way similar to const. (You can also apply const and volatile to the same type simultaneously.)

    Unlike const, volatile discriminates between primitive types and user-defined types. Namely, unlike classes, primitive types still support all of their operations (addition, multiplication, assignment, etc.) when volatile-qualified. For example, you can assign a non-volatile int to a volatile int, but you cannot assign a non-volatile object to a volatile object.

    Let's illustrate how volatile works on user-defined types on an example.

    class Gadget {
    public:
        void Foo() volatile;
        void Bar();
        ...
    private:
        String name_;
        int state_;
    };
    ...
    Gadget regularGadget;
    volatile Gadget volatileGadget;
    

    If you think volatile is not that useful with objects, prepare for some surprise.

    volatileGadget.Foo(); // ok, volatile fun called for
                      // volatile object
    regularGadget.Foo();  // ok, volatile fun called for
                      // non-volatile object
    volatileGadget.Bar(); // error! Non-volatile function called for
                      // volatile object!
    

    The conversion from a non-qualified type to its volatile counterpart is trivial. However, just as with const, you cannot make the trip back from volatile to non-qualified. You must use a cast:

    Gadget& ref = const_cast<Gadget&>(volatileGadget);
    ref.Bar(); // ok
    

    A volatile-qualified class gives access only to a subset of its interface, a subset that is under the control of the class implementer. Users can gain full access to that type's interface only by using a const_cast. In addition, just like constness, volatileness propagates from the class to its members (for example, volatileGadget.name_ and volatileGadget.state_ are volatile variables).

    volatile, Critical Sections, and Race Conditions

    The simplest and the most often-used synchronization device in multithreaded programs is the mutex. A mutex exposes the Acquire and Release primitives. Once you call Acquire in some thread, any other thread calling Acquire will block. Later, when that thread calls Release, precisely one thread blocked in an Acquire call will be released. In other words, for a given mutex, only one thread can get processor time in between a call to Acquire and a call to Release. The executing code between a call to Acquire and a call to Release is called a critical section. (Windows terminology is a bit confusing because it calls the mutex itself a critical section, while "mutex" is actually an inter-process mutex. It would have been nice if they were called thread mutex and process mutex.)

    Mutexes are used to protect data against race conditions. By definition, a race condition occurs when the effect of more threads on data depends on how threads are scheduled. Race conditions appear when two or more threads compete for using the same data. Because threads can interrupt each other at arbitrary moments in time, data can be corrupted or misinterpreted. Consequently, changes and sometimes accesses to data must be carefully protected with critical sections. In object-oriented programming, this usually means that you store a mutex in a class as a member variable and use it whenever you access that class' state.

    Experienced multithreaded programmers might have yawned reading the two paragraphs above, but their purpose is to provide an intellectual workout, because now we will link with the volatile connection. We do this by drawing a parallel between the C++ types' world and the threading semantics world.

    • Outside a critical section, any thread might interrupt any other at any time; there is no control, so consequently variables accessible from multiple threads are volatile. This is in keeping with the original intent of volatile — that of preventing the compiler from unwittingly caching values used by multiple threads at once.
    • Inside a critical section defined by a mutex, only one thread has access. Consequently, inside a critical section, the executing code has single-threaded semantics. The controlled variable is not volatile anymore — you can remove the volatile qualifier.

    In short, data shared between threads is conceptually volatile outside a critical section, and non-volatile inside a critical section.

    You enter a critical section by locking a mutex. You remove the volatile qualifier from a type by applying a const_cast. If we manage to put these two operations together, we create a connection between C++'s type system and an application's threading semantics. We can make the compiler check race conditions for us.

    LockingPtr

    We need a tool that collects a mutex acquisition and a const_cast. Let's develop a LockingPtr class template that you initialize with a volatile object obj and a mutex mtx. During its lifetime, a LockingPtr keeps mtx acquired. Also, LockingPtr offers access to the volatile-stripped obj. The access is offered in a smart pointer fashion, through operator-> and operator*. The const_cast is performed inside LockingPtr. The cast is semantically valid because LockingPtr keeps the mutex acquired for its lifetime.

    First, let's define the skeleton of a class Mutex with which LockingPtr will work:

    class Mutex {
    public:
        void Acquire();
        void Release();
        ...    
    };
    

    To use LockingPtr, you implement Mutex using your operating system's native data structures and primitive functions.

    LockingPtr is templated with the type of the controlled variable. For example, if you want to control a Widget, you use a LockingPtr that you initialize with a variable of type volatile Widget.

    LockingPtr's definition is very simple. LockingPtr implements an unsophisticated smart pointer. It focuses solely on collecting a const_cast and a critical section.

    template <typename T>
    class LockingPtr {
    public:
        // Constructors/destructors
        LockingPtr(volatile T& obj, Mutex& mtx)
          : pObj_(const_cast<T*>(&obj)), pMtx_(&mtx) {    
            mtx.Lock();    
        }
        ~LockingPtr() {    
            pMtx_->Unlock();    
        }
        // Pointer behavior
        T& operator*() {    
            return *pObj_;    
        }
        T* operator->() {   
            return pObj_;   
        }
    private:
        T* pObj_;
        Mutex* pMtx_;
        LockingPtr(const LockingPtr&);
        LockingPtr& operator=(const LockingPtr&);
    };
    

    In spite of its simplicity, LockingPtr is a very useful aid in writing correct multithreaded code. You should define objects that are shared between threads as volatile and never use const_cast with them — always use LockingPtr automatic objects. Let's illustrate this with an example.

    Say you have two threads that share a vector object:

    class SyncBuf {
    public:
        void Thread1();
        void Thread2();
    private:
        typedef vector<char> BufT;
        volatile BufT buffer_;
        Mutex mtx_; // controls access to buffer_
    };
    

    Inside a thread function, you simply use a LockingPtr to get controlled access to the buffer_ member variable:

    void SyncBuf::Thread1() {
        LockingPtr<BufT> lpBuf(buffer_, mtx_);
        BufT::iterator i = lpBuf->begin();
        for (; i != lpBuf->end(); ++i) {
            ... use *i ...
        }
    }
    

    The code is very easy to write and understand — whenever you need to use buffer_, you must create a LockingPtr pointing to it. Once you do that, you have access to vector's entire interface.

    The nice part is that if you make a mistake, the compiler will point it out:

    void SyncBuf::Thread2() {
        // Error! Cannot access 'begin' for a volatile object
        BufT::iterator i = buffer_.begin();
        // Error! Cannot access 'end' for a volatile object
        for ( ; i != lpBuf->end(); ++i ) {
            ... use *i ...
        }
    }
    

    You cannot access any function of buffer_ until you either apply a const_cast or use LockingPtr. The difference is that LockingPtr offers an ordered way of applying const_cast to volatile variables.

    LockingPtr is remarkably expressive. If you only need to call one function, you can create an unnamed temporary LockingPtr object and use it directly:

    unsigned int SyncBuf::Size() {
    return LockingPtr<BufT>(buffer_, mtx_)->size();
    }
    

    Back to Primitive Types

    We saw how nicely volatile protects objects against uncontrolled access and how LockingPtr provides a simple and effective way of writing thread-safe code. Let's now return to primitive types, which are treated differently by volatile.

    Let's consider an example where multiple threads share a variable of type int.

    class Counter {
    public:
        ...
        void Increment() { ++ctr_; }
        void Decrement() { —ctr_; }
    private:
        int ctr_;
    };
    

    If Increment and Decrement are to be called from different threads, the fragment above is buggy. First, ctr_ must be volatile. Second, even a seemingly atomic operation such as ++ctr_ is actually a three-stage operation. Memory itself has no arithmetic capabilities. When incrementing a variable, the processor:

    • Reads that variable in a register
    • Increments the value in the register
    • Writes the result back to memory

    This three-step operation is called RMW (Read-Modify-Write). During the Modify part of an RMW operation, most processors free the memory bus in order to give other processors access to the memory.

    If at that time another processor performs a RMW operation on the same variable, we have a race condition: the second write overwrites the effect of the first.

    To avoid that, you can rely, again, on LockingPtr:

    class Counter {
    public:
        ...
        void Increment() { ++*LockingPtr<int>(ctr_, mtx_); }
        void Decrement() { —*LockingPtr<int>(ctr_, mtx_); }
    private:
        volatile int ctr_;
        Mutex mtx_;
    };
    

    Now the code is correct, but its quality is inferior when compared to SyncBuf's code. Why? Because with Counter, the compiler will not warn you if you mistakenly access ctr_ directly (without locking it). The compiler compiles ++ctr_ if ctr_ is volatile, although the generated code is simply incorrect. The compiler is not your ally anymore, and only your attention can help you avoid race conditions.

    What should you do then? Simply encapsulate the primitive data that you use in higher-level structures and use volatile with those structures. Paradoxically, it's worse to use volatile directly with built-ins, in spite of the fact that initially this was the usage intent of volatile!

    volatile Member Functions

    So far, we've had classes that aggregate volatile data members; now let's think of designing classes that in turn will be part of larger objects and shared between threads. Here is where volatile member functions can be of great help.

    When designing your class, you volatile-qualify only those member functions that are thread safe. You must assume that code from the outside will call the volatile functions from any code at any time. Don't forget: volatile equals free multithreaded code and no critical section; non-volatile equals single-threaded scenario or inside a critical section.

    For example, you define a class Widget that implements an operation in two variants — a thread-safe one and a fast, unprotected one.

    class Widget {
    public:
        void Operation() volatile;
        void Operation();
        ...
    private:
        Mutex mtx_;
    };
    

    Notice the use of overloading. Now Widget's user can invoke Operation using a uniform syntax either for volatile objects and get thread safety, or for regular objects and get speed. The user must be careful about defining the shared Widget objects as volatile.

    When implementing a volatile member function, the first operation is usually to lock this with a LockingPtr. Then the work is done by using the non- volatile sibling:

    void Widget::Operation() volatile {
        LockingPtr<Widget> lpThis(*this, mtx_);
        lpThis->Operation(); // invokes the non-volatile function
    }
    

    Summary

    When writing multithreaded programs, you can use volatile to your advantage. You must stick to the following rules:

    • Define all shared objects as volatile.
    • Don't use volatile directly with primitive types.
    • When defining shared classes, use volatile member functions to express thread safety.

    If you do this, and if you use the simple generic component LockingPtr, you can write thread-safe code and worry much less about race conditions, because the compiler will worry for you and will diligently point out the spots where you are wrong.

    A couple of projects I've been involved with use volatile and LockingPtr to great effect. The code is clean and understandable. I recall a couple of deadlocks, but I prefer deadlocks to race conditions because they are so much easier to debug. There were virtually no problems related to race conditions. But then you never know.

    Acknowledgements

    Many thanks to James Kanze and Sorin Jianu who helped with insightful ideas.


    Andrei Alexandrescu is a Development Manager at RealNetworks Inc. (www.realnetworks.com), based in Seattle, WA, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (www.gotw.ca/cpp_seminar).

    This article might be a little dated, but it does give good insight towards an excellent use of using the volatile modifier with in the use of multithreaded programming to help keep events asynchronous while having the compiler checking for race conditions for us. This may not directly answer the OPs original question about creating a memory fence, but I choose to post this as an answer for others as an excellent reference towards a good use of volatile when working with multithreaded applications.

    0 讨论(0)
  • 2020-11-28 21:18

    What David is overlooking is the fact that the C++ standard specifies the behavior of several threads interacting only in specific situations and everything else results in undefined behavior. A race condition involving at least one write is undefined if you don't use atomic variables.

    Consequently, the compiler is perfectly in its right to forego any synchronization instructions since your CPU will only notice the difference in a program that exhibits undefined behavior due to missing synchronization.

    0 讨论(0)
  • 2020-11-28 21:18

    It doesn't have to. Volatile is not a synchronization primitive. It just disables optimisations, i.e. you get a predictable sequence of reads and writes within a thread in the same order as prescribed by the abstract machine. But reads and writes in different threads have no order in the first place, it makes no sense to speak of preserving or not preserving their order. The order between theads can be established by synchronization primitives, you get UB without them.

    A bit of explanation regarding memory barriers. A typical CPU has several levels of memory access. There is a memory pipeline, several levels of cache, then RAM etc.

    Membar instructions flush the pipeline. They don't change the order in which reads and writes are executed, it just forces outstanding ones to be executed at a given moment. It is useful for multithreaded programs, but not much otherwise.

    Cache(s) are normally automatically coherent between CPUs. If one wants to make sure the cache is in sync with RAM, cache flush is needed. It is very different from a membar.

    0 讨论(0)
提交回复
热议问题