C++ 2011 : std::thread : simple example to parallelize a loop?

前端 未结 6 1248
感动是毒
感动是毒 2020-12-13 07:14

C++ 2011 includes very cool new features, but I can\'t find a lot of example to parallelize a for-loop. So my very naive question is : how do you parallelize a simple for lo

相关标签:
6条回答
  • 2020-12-13 07:37

    Using this class you can do it as:

    Range based loop (read and write)
    pforeach(auto &val, container) { 
      val = sin(val); 
    };
    
    Index based for-loop
    auto new_container = container;
    pfor(size_t i, 0, container.size()) { 
      new_container[i] = sin(container[i]); 
    };
    
    0 讨论(0)
  • 2020-12-13 07:37

    AFAIK the simplest way to parallelize a loop, if you are sure that there are no concurrent access possible, is by using OpenMP.

    It is supported by all major compilers except LLVM (as of August 2013).

    Example :

    for(int i = 0; i < n; ++i)
    {
       tab[i] *= 2;
       tab2[i] /= 2;
       tab3[i] += tab[i] - tab2[i];
    }
    

    This would be parallelized very easily like this :

    #pragma omp parallel for
    for(int i = 0; i < n; ++i)
    {
       tab[i] *= 2;
       tab2[i] /= 2;
       tab3[i] += tab[i] - tab2[i];
    }
    

    However, be aware that this is only efficient with a big number of values.

    If you use g++, another very C++11-ish way of doing would be using a lambda and a for_each, and use gnu parallel extensions (which can use OpenMP behind the scene) :

    __gnu_parallel::for_each(std::begin(tab), std::end(tab), [&] () 
    {
        stuff_of_your_loop();
    });
    

    However, for_each is mainly thought for arrays, vectors, etc... But you can "cheat" it if you only want to iterate through a range by creating a Range class with begin and end method which will mostly increment an int.

    Note that for simple loops that do mathematical stuff, the algorithms in #include <numeric> and #include <algorithm> can all be parallelized with G++.

    0 讨论(0)
  • 2020-12-13 07:47

    Well obviously it depends on what your loop does, how you choose to parallellize, and how you manage the threads lifetime.

    I'm reading the book from the std C++11 threading library (that is also one of the boost.thread maintainer and wrote Just Thread ) and I can see that "it depends".

    Now to give you an idea of basics using the new standard threading, I would recommend to read the book as it gives plenty of examples. Also, take a look at http://www.justsoftwaresolutions.co.uk/threading/ and https://stackoverflow.com/questions/415994/boost-thread-tutorials

    0 讨论(0)
  • 2020-12-13 07:47

    Define macro using std::thread and lambda expression:

    #ifndef PARALLEL_FOR
    #define PARALLEL_FOR(INT_LOOP_BEGIN_INCLUSIVE, INT_LOOP_END_EXCLUSIVE,I,O)          \                                                               \
        {                                                                               \
            int LOOP_LIMIT=INT_LOOP_END_EXCLUSIVE-INT_LOOP_BEGIN_INCLUSIVE;             \
            std::thread threads[LOOP_LIMIT]; auto fParallelLoop=[&](int I){ O; };       \
            for(int i=0; i<LOOP_LIMIT; i++)                                             \
            {                                                                           \
                threads[i]=std::thread(fParallelLoop,i+INT_LOOP_BEGIN_INCLUSIVE);       \
            }                                                                           \
            for(int i=0; i<LOOP_LIMIT; i++)                                             \
            {                                                                           \
                threads[i].join();                                                      \
            }                                                                           \
        }                                                                               \
    #endif
    

    usage:

    int aaa=0; // std::atomic<int> aaa;
    PARALLEL_FOR(0,90,i,
    {
        aaa+=i;
    });
    

    its ugly but it works (I mean, the multi-threading part, not the non-atomic incrementing).

    0 讨论(0)
  • Can't provide a C++11 specific answer since we're still mostly using pthreads. But, as a language-agnostic answer, you parallelise something by setting it up to run in a separate function (the thread function).

    In other words, you have a function like:

    def processArraySegment (threadData):
        arrayAddr = threadData->arrayAddr
        startIdx  = threadData->startIdx
        endIdx    = threadData->endIdx
    
        for i = startIdx to endIdx:
            doSomethingWith (arrayAddr[i])
    
        exitThread()
    

    and, in your main code, you can process the array in two chunks:

    int xyzzy[100]
    
    threadData->arrayAddr = xyzzy
    threadData->startIdx  = 0
    threadData->endIdx    = 49
    threadData->done      = false
    tid1 = startThread (processArraySegment, threadData)
    
    // caveat coder: see below.
    threadData->arrayAddr = xyzzy
    threadData->startIdx  = 50
    threadData->endIdx    = 99
    threadData->done      = false
    tid2 = startThread (processArraySegment, threadData)
    
    waitForThreadExit (tid1)
    waitForThreadExit (tid2)
    

    (keeping in mind the caveat that you should ensure thread 1 has loaded the data into its local storage before the main thread starts modifying it for thread 2, possibly with a mutex or by using an array of structures, one per thread).

    In other words, it's rarely a simple matter of just modifying a for loop so that it runs in parallel, though that would be nice, something like:

    for {threads=10} ({i} = 0; {i} < ARR_SZ; {i}++)
        array[{i}] = array[{i}] + 1;
    

    Instead, it requires a bit of rearranging your code to take advantage of threads.

    And, of course, you have to ensure that it makes sense for the data to be processed in parallel. If you're setting each array element to the previous one plus 1, no amount of parallel processing will help, simply because you have to wait for the previous element to be modified first.

    This particular example above simply uses an argument passed to the thread function to specify which part of the array it should process. The thread function itself contains the loop to do the work.

    0 讨论(0)
  • 2020-12-13 07:51

    std::thread is not necessarily meant to parallize loops. It is meant to be the lowlevel abstraction to build constructs like a parallel_for algorithm. If you want to parallize your loops, you should either wirte a parallel_for algorithm yourself or use existing libraires which offer task based parallism.

    The following example shows how you could parallize a simple loop but on the other side also shows the disadvantages, like the missing load-balancing and the complexity for a simple loop.

      typedef std::vector<int> container;
      typedef container::iterator iter;
    
      container v(100, 1);
    
      auto worker = [] (iter begin, iter end) {
        for(auto it = begin; it != end; ++it) {
          *it *= 2;
        }
      };
    
    
      // serial
      worker(std::begin(v), std::end(v));
    
      std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 200
    
      // parallel
      std::vector<std::thread> threads(8);
      const int grainsize = v.size() / 8;
    
      auto work_iter = std::begin(v);
      for(auto it = std::begin(threads); it != std::end(threads) - 1; ++it) {
        *it = std::thread(worker, work_iter, work_iter + grainsize);
        work_iter += grainsize;
      }
      threads.back() = std::thread(worker, work_iter, std::end(v));
    
      for(auto&& i : threads) {
        i.join();
      }
    
      std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 400
    

    Using a library which offers a parallel_for template, it can be simplified to

    parallel_for(std::begin(v), std::end(v), worker);
    
    0 讨论(0)
提交回复
热议问题