Atomic operations and code generation for gcc

后端未结

关注

 1  1237

I am curring looking at some assembly generated for atomic operations by gcc. I tried the following short sequence:

int x1;
int x2;

int foo;

void test()
{


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-01 22:44
              
            
            
                                                                       
The xchg instruction has implied lock semantics when the destination is a memory location.  What this means is you can swap the contents of a register with the contents of a memory location atomically.
The example in the question is doing an atomic store, not a swap.  The x86 architecture memory model guarantees that in a multi-processor/multi-core system stores done by one thread will be seen in that order by other threads... therefore a memory move is sufficient.  Having said that, there are older Intel CPUs and some clones where there are bugs in this area, and an xchg is required as a workaround on those CPUs.  See the Significant optimizations section of this wikipedia article on spinlocks:
http://en.wikipedia.org/wiki/Spinlock#Example_implementation
Which states

The simple implementation above works on all CPUs using the x86 architecture. However, a number of performance optimizations are possible:
On later implementations of the x86 architecture, spin_unlock can safely use an unlocked MOV instead of the slower locked XCHG. This is due to subtle memory ordering rules which support this, even though MOV is not a full memory barrier.  However, some processors (some Cyrix processors, some revisions of the Intel Pentium Pro (due to bugs), and earlier Pentium and i486 SMP systems) will do the wrong thing and data  protected by the lock could be corrupted. On most non-x86 architectures, explicit memory  barrier or atomic instructions (as in the example) must be used. On some systems, such as IA-64, there are special "unlock" instructions which provide the needed memory ordering.

The memory barrier, mfence, ensures that all stores have completed (store buffers in the CPU core are empty and values stored in the cache or memory), it also ensures that no future loads execute out of order.
The fact a MOV is sufficient to unlock the mutex (no serialization or memory barrier required) was "officially" clarified in a reply to Linus Torvalds by an Intel architect back in 1999
http://lkml.org/lkml/1999/11/24/90.
I guess it was later discovered that didn't work for some older x86 processors.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复