Does writing the same value to the same memory location cause a data race?

后端未结

关注

 3  1170

Consider the following code that writes the same value to the same memory location from multiple threads:

void f(int* buf, int n, int* p) {
    for(int i = 0


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-01-04 01:02
              
            
            
                                                                       
The key point here is indeed, as @Maxim said, cache coherency. In a cache coherent architecture it's impossible indeed.
However, it can go wrong on a machine with no cache coherency. I don't know a specific architecture, and although they're almost extinct due to natural selection, as far as I know there are some remaining. (If you know an example, please comment.)
Here is a table that represents an execution of two threads filling a zeroed region in memory with ones. For brevity this example is scaled down by a factor of 32, i.e. each digit here represents a 4-byte int in question. Cache line size is 4 ints == 4 digits. The lines marked as "flush" are points where the on-chip cache is flushed to the main memory. In reality it's non-deterministic, as it may happen at any time, e.g. due to a preemptive task switch.
Core 1 cache              Memory                    Core 2 cache
------------------------------------------------------------------------------
                          0000
0000 (load cache)         0000
1000 (set 1st bit)        0000
1100 (set 2nd bit)        0000                      0000 (load cache)
**** (flush)              1100
                          1100                      1000 (set 1st bit)
                          1000                      **** (flush)
                          1000                      1000 (load cache)
                          1000                      1100 (set 2nd bit)
1000 (load cache)         1000                      1110 (set 3rd bit)
1010 (set 3rd bit)        1000                      1111 (set 4th bit)
1011 (set 4th bit)        1111                      **** (flush)
**** (flush)              1011

So we got a wrong result in the end.
I emphasize again that this counter-example is valid only on cache incoherent machines.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-04 01:26
              
            
            
                                                                       
Memory models with regards to multi-treading concern when the effects of writes made by one thread are observable by another thread. In the code you posted both threads write the same values into the same memory location, so it doesn't matter which thread's write buf[n/2] reads, either will do.

Modern processors employ cache coherency protocols, such as MESI, so when the threads write to the buffer concurrently there is going to be a lot of messages sent between the CPUs to synchronize the cache lines holding the buffer making it run much slower than in non-concurrent scenario (false sharing effect).

Here it doesn't matter if the writes are atomic or not, since both threads write the same values to the same memory locations. There is a race, but it doesn't matter which thread wins because the observed values are going to be the same even with partial writes.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2021-01-04 01:29
              
            
            
                                                                       
There is a race, but in your example both threads will write the same values to the same addresses.  Since you are not doing any read-modify-writes, but just writing predetermined numbers, this will be safe in most cases.  Writing an int will be an atomic instruction on most systems.  The exception would be if you ran this code on a 8-bit microprocessor that uses a sequence of instructions to store an int.  In that case it also may still work, but depends on the implementation of the library code that does the multi-byte store.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复