Thrust equivalent of Open MP code

后端未结

关注

 1  1442

忘了有多久 2021-01-24 05:30

The code i\'m trying to parallelize in open mp is a Monte Carlo that boils down to something like this:

int seed = 0;
std::mt19937 rng(seed); 
double result = 0.


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   时光说笑
                                             
                
                
                (楼主)
            
              
              
                2021-01-24 06:16
              

            
            
                        
Yes, it's possible to use thrust to do something similar, with (parallel) execution on the host CPU using OMP threads underneath the thrust OMP backend.  Here's one example:

$ cat t535.cpp
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[]){
  unsigned N = 1;
  int seed = 0;
  if (argc > 1)  N = atoi(argv[1]);
  if (argc > 2)  seed = atoi(argv[2]);
  std::mt19937 rng(seed);
  unsigned long result = 0;

  thrust::omp::vector vec(N);
  thrust::generate(thrust::omp::par, vec.begin(), vec.end(), rng);
  result = thrust::reduce(thrust::omp::par, vec.begin(), vec.end());
  std::cout << result << std::endl;
  return 0;
}
$ g++ -std=c++11 -O2 -I/usr/local/cuda/include -o t535 t535.cpp -fopenmp -lgomp
$ time ./t535 100000000
214746750809749347

real    0m0.700s
user    0m2.108s
sys     0m0.600s
$


For this test I used Fedora 20, with CUDA 6.5RC, running on a 4-core Xeon CPU (netting about a 3x speedup based on time results).  There are probably some further "optimizations" that could be made for this particular code, but I think they will unnecessarily clutter the idea, and I assume that your actual application is more complicated than just summing random numbers.

Much of what I show here was lifted from the thrust direct system access page but there are several comparable methods to access the OMP backend, depending on whether you want to have a flexible, retargettable code, or you want one that specifically uses the OMP backend (this one specifically targets OMP backend).

The thrust::reduction operation guarantees the "atomicity" you are looking for.  Specifically, it guarantees that two threads are not trying to update a single location at the same time.  However the use of std::mt19937 in a multithreaded OMP app is outside the scope of my answer, I think.  If I create an ordinary OMP app using the code you provided, I observe variability in the results due (I think) to some interaction between the use of the std::mt19937 rng in multiple OMP threads.  This is not something thrust can sort out for you.

Thrust also has random number generators, which are designed to work with it.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复