Thrust equivalent of Open MP code

后端 未结 1 1441
忘了有多久
忘了有多久 2021-01-24 05:30

The code i\'m trying to parallelize in open mp is a Monte Carlo that boils down to something like this:

int seed = 0;
std::mt19937 rng(seed); 
double result = 0.         


        
相关标签:
1条回答
  • 2021-01-24 06:16

    Yes, it's possible to use thrust to do something similar, with (parallel) execution on the host CPU using OMP threads underneath the thrust OMP backend. Here's one example:

    $ cat t535.cpp
    #include <random>
    #include <iostream>
    #include <thrust/system/omp/execution_policy.h>
    #include <thrust/system/omp/vector.h>
    #include <thrust/reduce.h>
    
    int main(int argc, char *argv[]){
      unsigned N = 1;
      int seed = 0;
      if (argc > 1)  N = atoi(argv[1]);
      if (argc > 2)  seed = atoi(argv[2]);
      std::mt19937 rng(seed);
      unsigned long result = 0;
    
      thrust::omp::vector<unsigned long> vec(N);
      thrust::generate(thrust::omp::par, vec.begin(), vec.end(), rng);
      result = thrust::reduce(thrust::omp::par, vec.begin(), vec.end());
      std::cout << result << std::endl;
      return 0;
    }
    $ g++ -std=c++11 -O2 -I/usr/local/cuda/include -o t535 t535.cpp -fopenmp -lgomp
    $ time ./t535 100000000
    214746750809749347
    
    real    0m0.700s
    user    0m2.108s
    sys     0m0.600s
    $
    

    For this test I used Fedora 20, with CUDA 6.5RC, running on a 4-core Xeon CPU (netting about a 3x speedup based on time results). There are probably some further "optimizations" that could be made for this particular code, but I think they will unnecessarily clutter the idea, and I assume that your actual application is more complicated than just summing random numbers.

    Much of what I show here was lifted from the thrust direct system access page but there are several comparable methods to access the OMP backend, depending on whether you want to have a flexible, retargettable code, or you want one that specifically uses the OMP backend (this one specifically targets OMP backend).

    The thrust::reduction operation guarantees the "atomicity" you are looking for. Specifically, it guarantees that two threads are not trying to update a single location at the same time. However the use of std::mt19937 in a multithreaded OMP app is outside the scope of my answer, I think. If I create an ordinary OMP app using the code you provided, I observe variability in the results due (I think) to some interaction between the use of the std::mt19937 rng in multiple OMP threads. This is not something thrust can sort out for you.

    Thrust also has random number generators, which are designed to work with it.

    0 讨论(0)
提交回复
热议问题