Thread safety while looping with OpenMP

前端 未结 3 1984
再見小時候
再見小時候 2021-01-27 10:41

I\'m working on a small Collatz conjecture calculator using C++ and GMP, and I\'m trying to implement parallelism on it using OpenMP, but I\'m coming across issues regarding thr

相关标签:
3条回答
  • 2021-01-27 11:24

    You could well be right about collisions with x. You can mark x as private by:

    #pragma omp parallel private(x)
    

    This way each thread gets their own "version" of the variable x, which should make this thread-safe. By default, variables declared before a #pragma omp parallel are public, so there is one shared instance between all of the threads.

    0 讨论(0)
  • 2021-01-27 11:26

    I assume what you want to do is to check if the collatz conjecture holds for all numbers. The program you posted is wrong on many levels both serially and in parallel.

    if (mpz_cmp_ui(x.get_mpz_t(), 1)) break;
    

    Means that it will break when x != 1. If you replace it with the correct 0 == mpz_cmp_ui, the code will just continue to test 2 over and over again. You have to have two variables anyway, one for the outer loop that represents what you want to check, and one for the inner loop performing the check. It's easier to get this right if you make a function for that:

    void check_collatz(mpz_class n) {
        while (n != 1) {
            n = collatz(n);
        }
    }
    
    int main() {
        mpz_class x = 1;
        while (true) {
            std::cout << x.get_str(10);
            check_collatz(x);
            x++;
        }
    }
    

    The while (true) loop is bad to reason about and parallelize, so let's just make an equivalent for loop:

    for (mpz_class x = 1;; x++) {
        check_collatz(x);
    }
    

    Now, we can talk about parallelizing the code. The basis for OpenMP parallelizing is a worksharing construct. You cannot just slap #pragma omp parallel on a while loop. Fortunately you can easily mark certain canonical for loops with #pragma omp parallel for. For that, however, you cannot use mpz_class as a loop variable, and you must specify an end for the loop:

    #pragma omp parallel for
    for (long check = 1; check <= std::numeric_limits<long>::max(); check++)
    {
        check_collatz(check);
    }
    

    Note that check is implicitly private, there is a copy for each thread working on it. Also OpenMP will take care of distributing the work [1 ... 2^63] among threads. When a thread calls check_collatz a new, private, mpz_class object will be created for it.

    Now, you might notice, that repeatedly creating a new mpz_class object in each loop iteration is costly (memory allocation). You can reuse that (by breaking check_collatz again) and creating a thread-private mpz_class working object. For this, you split the compound parallel for into separate parallel and for pragmas:

    #include <gmpxx.h>
    #include <iostream>
    #include <limits>
    
    // Avoid copying objects by taking and modifying a reference
    void collatz(mpz_class& n)
    {
        if (mpz_odd_p(n.get_mpz_t()))
        {
            n *= 3;
            n += 1;
        }
        else
        {
            n /= 2;
        }
    }
    
    int main()
    {
    #pragma omp parallel
        {
            mpz_class x;
    #pragma omp for
            for (long check = 1; check <= std::numeric_limits<long>::max(); check++)
            {
                // Note: The structure of this fits perfectly in a for loop.
                for (x = check; x != 1; collatz(x));
            }
        }
    }
    

    Note that declaring x in the parallel region will make sure it is implicitly private and properly initialized. You should prefer that to declaring it outside and marking it private. This will often lead to confusion because explicitly private variables from outside scope are unitialized.

    You might complain that this only checks the first 2^63 numbers. Just let it run. This gives you enough time to master OpenMP to expert level and write your own custom worksharing for GMP objects.

    You were concerned about having extra objects for each thread. This is essential for good performance. You cannot solve this efficiently with locks/critical sections/atomics. You would have to protect each and every read and write to your only relevant variable. There would be no parallelism left.

    Note: The huge for loop will likely have a load imbalance. So some threads will probably finish a few centuries earlier than the others. You could fix that with dynamic scheduling, or smaller static chunks.

    Edit: For academic sake, here is one idea how to implement the worksharing directly on GMP objects:

    #pragma omp parallel
        {
            // Note this is not a "parallel" loop
            // these are just separate loops on distinct strided 
            int nthreads = omp_num_threads();
            mpz_class check = 1;
            // we already checked those in the other program
            check += std::numeric_limits<long>::max(); 
            check += omp_get_thread_num();
            mpz_class x;
            for (; ; check += nthreads)
            {
                // Note: The structure of this fits perfectly in a for loop.
                for (x = check; x != 1; collatz(x));
            }
        }
    
    0 讨论(0)
  • 2021-01-27 11:27

    You might want to touch x only with atomic instructions.

    #pragma omp atomic
    x++;
    

    This ensures that all threads see the same value of x without requires mutexes or other synchronization techniques.

    0 讨论(0)
提交回复
热议问题