c++ how to elegantly use c++17 parallel execution with for loop that counts an integer?

问题

I can do

std::vector<int> a;
a.reserve(1000);
for(int i=0; i<1000; i++)
    a.push_back(i);
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {
  ... do something based on i ...
});

but is there a more elegant way of creating a parallelized version of for(int i=0; i<n; i++) that does not require me to first fill a vector with ascending ints?

回答1:

You could use std::generate to create a vector {0, 1, ..., 999}

std::vector<int> v(1000);
std::generate(v.begin(), v.end(), [n = 0] () mutable { return n++; });

There is an overload that accepts an ExecutionPolicy so you could modify the above to

std::vector<int> v(1000);
std::generate(std::execution::par, v.begin(), v.end(), [n = 0] () mutable { return n++; });

回答2:

Although I can't suggest a way to avoid filling a vector, I can recommend using the std::iota() function as (perhaps) the most efficient/elegant way to fill it with incrementing integers:

std::vector<int> a(1000);
std::iota(std::begin(a), std::end(a), 0);
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int i) {
  // ... do something based on i ...
});

The complexity of std::iota is exactly last - first increments and assignments, whereas the std::generate function has a complexity of last - first invocations of g() and assignments. Even if a decent compiler were to inline a simple increment lambda function for g, the iota syntax is considerably simpler, IMHO.

回答3:

Here are two ways to do it without pre-populating a vector just to store a sequence of integers.

You can do it with Boost.counting_range (or directly using Boost.counting_iterator as you prefer) ... although good luck finding out how from reading the documentation.

 auto range = boost::counting_range<int>(0,1000);
 std::for_each(std::execution::par_unseq,
               range.begin(),
               range.end(),
               [&](int i) {
                   //  ... do something based on i ...
               });

If you don't want to include Boost, we can write a simple version directly.

With no apology for munging iota and iterator together instead of coming up with a decent name, the below will let you write something similar to the Boost version above:

 std::for_each(std::execution::par_unseq,
               ioterable<int>(0),
               ioterable<int>(1000),
               [&](int i) {
                 //  ... do something based on i ...
               }
 );

You can see how much boilerplate you save by using Boost for this:

 template <typename NumericType>
 struct ioterable
 {
     using iterator_category = std::input_iterator_tag;
     using value_type = NumericType;
     using difference_type = NumericType;
     using pointer = std::add_pointer_t<NumericType>;
     using reference = NumericType;

     explicit ioterable(NumericType n) : val_(n) {}

     ioterable() = default;
     ioterable(ioterable&&) = default;
     ioterable(ioterable const&) = default;
     ioterable& operator=(ioterable&&) = default;
     ioterable& operator=(ioterable const&) = default;

     ioterable& operator++() { ++val_; return *this; }
     ioterable operator++(int) { ioterable tmp(*this); ++val_; return tmp; }
     bool operator==(ioterable const& other) const { return val_ == other.val_; }
     bool operator!=(ioterable const& other) const { return val_ != other.val_; }

     value_type operator*() const { return val_; }

 private:
     NumericType val_{ std::numeric_limits<NumericType>::max() };
 };

For posterity, and in case you can use C++20 in the future, std::ranges::iota_view will preferable where available.

回答4:

VisualC++ provides a rich parallel programming enviromnent, concurrency runtime ConCRT.
You can use OpenMP, which is open standard but also available in ConCRT. As described on wikipedia it is embarrassingly parallel, following code is supposed to create 1000 threads:

#include <omp.h>
...
#pragma omp parallel for
for(int s = 0; s < 1000; s++)
{
    for(int i = 0; i < s; i++)
        ... do something parallel based on i ...
}

The #pragma omp directives are ignored if compiler option /openmp is not specified. In fact I don't understand the role of your vector, so I omitted it. Also I don't understand the reasoning behind the replacing of the standard for with any for_each and work with saved indexes, since for loop does it pretty well.
Or you can use Microsoft specific library PPL. Following code also creates 1000 threads, generating indexes from 0 to 999 inclusive and passing to parallel routine as s variable:

#include <ppl.h>
...
using namespace concurrency;
parallel_for(0, 1000, [&](int s)
{
   for(int i = 0; i < s; i++)
      ... do something parallel based on i ...
});

For heavy parallel computations there is also AMP available in concurrency runtime. AMP does the parallel routines on GPU instead of CPU.

来源：https://stackoverflow.com/questions/63340193/c-how-to-elegantly-use-c17-parallel-execution-with-for-loop-that-counts-an-i

标签

c++

parallel-processing

c++17