Why is an acquire barrier needed before deleting the data in an atomically reference counted smart pointer?

后端未结

关注

 3  1497

Boost provides a sample atomically reference counted shared pointer

Here is the relevant code snippet and the explanation for the various orderings used:


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-02-08 05:15
              
            
            
                                                                       
Consider two threads, each holding one reference to the object, which are the last two references:

------------------------------------------------------------
        Thread 1                              Thread 2
------------------------------------------------------------
   // play with x here

   fetch_sub(...)                            
                                            fetch_sub(...)
   // nothing
                                            delete x;


You have to ensure that any changes made to the object by Thread 1 in //play with x here is visible to Thread 2 when it calls delete x;. For this you need an acquire fence, which, together with the memory_order_release on the fetch_sub() calls, guarantees that the changes made by Thread 1 will be visible.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-02-08 05:30
              
            
            
                                                                       
I think I found a rather simple example that shows why the acquire fence is needed.
Let's assume our X looks like this:
struct X
{
    ~X() { free(data); }
    void* data;
    atomic<int> refcount;
};

Let's further assume that we have two functions foo and bar that look like this (I'll inline the reference count decrements):
void foo(X* x)
{
    void* newData = generateNewData();
    free(x->data);
    x->data = newData;
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
        delete x;
}

void bar(X* x)
{
    // Do something unrelated to x
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
        delete x;
}

The delete instruction will execute x's destructor and then free the memory occupied by x. Let's inline that:
void bar(X* x)
{
    // Do something unrelated to x
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
    {
        free(x->data);
        operator delete(x);
    }
}

Because there is no acquire fence, the compiler could decide to load the address x->data to a register before executing the atomic decrement (as long as there is no data race, the observable effect would be the same):
void bar(X* x)
{
    register void* r1 = x->data;
    // Do something unrelated to x
    if (x->refcount.fetch_sub(1, memory_order_release) == 1)
    {
        free(r1);
        operator delete(x);
    }
}

Now let's assume that refcount of x is 2 and that we have two threads. Thread 1 calls foo, thread 2 calls bar:

Thread 2 loads x->data to a register.
Thread 1 generates new data.
Thread 1 frees the "old" data.
Thread 1 assigns the new data to x->data.
Thread 1 decrements refcount from 2 to 1.
Thread 2 decrements refcount from 1 to 0.
Thread 2 frees the "old" data again instead of the new data.

Key insight for me was that "prior writes [...] become visible in this thread" can mean something trivial as "do not use values you cached to registers before the fence".
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  花落未央        
                
              
                            
                2021-02-08 05:39
              
            
            
                                                                       
From, http://en.cppreference.com/w/cpp/atomic/memory_order


  memory_order_acquire -- A load operation with this memory order performs the acquire operation on the affected memory location: prior
  writes made to other memory locations by the thread that did the
  release become visible in this thread.
  
  ...
  
  Release-Acquire ordering
  
  If an atomic store in thread A is tagged std::memory_order_release and
  an atomic load in thread B from the same variable is tagged
  std::memory_order_acquire, all memory writes (non-atomic and relaxed
  atomic) that happened-before the atomic store from the point of view
  of thread A, become visible side-effects in thread B, that is, once
  the atomic load is completed, thread B is guaranteed to see everything
  thread A wrote to memory.
  
  The synchronization is established only between the threads releasing
  and acquiring the same atomic variable. Other threads can see
  different order of memory accesses than either or both of the
  synchronized threads.
  
  On strongly-ordered systems (x86, SPARC TSO, IBM mainframe),
  release-acquire ordering is automatic for the majority of operations.
  No additional CPU instructions are issued for this synchronization
  mode, only certain compiler optimizations are affected (e.g. the
  compiler is prohibited from moving non-atomic stores past the atomic
  store-release or perform non-atomic loads earlier than the atomic
  load-acquire). On weakly-ordered systems (ARM, Itanium, PowerPC),
  special CPU load or memory fence instructions have to be used.


This means that release allows other threads to synchronize pending operations from current thread, while the later acquire fetches all modified changes from the other threads.

On strongly-ordered systems, this is not as important. I don't think these instructions even generate code as the CPU automatically locks cache lines before any writes can occur. The cache is guaranteed to be consistent. But on weekly ordered systems, while atomic operations are well defined, there could be pending operations to other parts of memory.

So, let's say threads A and B and both share some data D.


A gets some lock and it does things to D
A releases lock
B releases lock, finds 0 ref count and so decides to delete D
deletes D
... data pending in #1 is not visible yet, so bad things happen.


with the thread fence acquire before delete, the current thread synchronizes all pending operations from other threads in its address space. And when delete happens, it sees what A did in #1.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复