Calculating the sum of a large vector in parallel

前端未结

关注

 2  981

Problem background

I have a program that currently takes way too long to sum up large std::vectors of ~100 million elements using std::accumulat


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-01-06 05:23
              
            
            
                                                                       
You can use Boost Asio as a thread pool. But there's not a lot of sense in it unless you have... asynchronous IO operations to coordinate.

In this answer to "c++ work queues with blocking" I show two thread_pool implementations:


Solution #1: one based on boost::asio::io_service
Solution #2: the other based on boost::thread primitives


Both accept any void() signature compatible task. This means, you could wrap your function-that-returns-the-important-results in a packaged_task<...> and get the future<RetVal> from it.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2021-01-06 05:37
              
            
            
                                                                       
Is Boost.Asio suitable for this problem?

The main purpose of Boost.Asio is to provide an asynchronous model for network and I/O programming, and the problem you describe does not seem to have much to do with networking and I/O.

I think that the simplest solution is to use the threading primitives provided by either Boost or the C++ standard library.

A parallel algorithm

Here's an example of a parallel version of accumulate created by only using the standard library.

/* Minimum number of elements for multithreaded algorithm.
   Less than this and the algorithm is executed on single thread. */
static const int MT_MIN_SIZE = 10000;

template <typename InputIt, typename T>
auto parallel_accumulate(InputIt first, InputIt last, T init) {
    // Determine total size.
    const auto size = std::distance(first, last);
    // Determine how many parts the work shall be split into.
    const auto parts = (size < MT_MIN_SIZE)? 1 : std::thread::hardware_concurrency();

    std::vector<std::future<T>> futures;

    // For each part, calculate size and run accumulate on a separate thread.
    for (std::size_t i = 0; i != parts; ++i) {
        const auto part_size = (size * i + size) / parts - (size * i) / parts;
        futures.emplace_back(std::async(std::launch::async,
            [=] { return std::accumulate(first, std::next(first, part_size), T{}); }));
        std::advance(first, part_size);
    }

    // Wait for all threads to finish execution and accumulate results.
    return std::accumulate(std::begin(futures), std::end(futures), init,
        [] (const T prev, auto& future) { return prev + future.get(); });
}


Live example _{(Parallel version performs about the same as sequential on Coliru, probably only 1 core available)}

Timings

On my machine (using 8 threads) the parallel version gave, on average, a ~120 % boost in performance.


  Sequential sum:

  Time taken: 46 ms

  5000000050000000

  --------------------------------

  Parallel sum:

  Time taken: 21 ms

  5000000050000000



However, the absolute gain for 100,000,000 elements is only marginal (25 ms). Although, the performance gain might be greater when accumulating a different element type than int.

OpenMP

As mentioned by @sehe in the comments, it is worth mentioning that OpenMP might provide a simple solution to this problem, e.g.

template <typename T, typename U>
auto omp_accumulate(const std::vector<T>& v, U init) {
    U sum = init;

    #pragma omp parallel for reduction(+:sum)
    for(std::size_t i = 0; i < v.size(); i++) {
        sum += v[i];
    }

    return sum;
}


On my machine this method performed the same as the parallel method using standard thread primitives.


  Sequential sum:

  Time taken: 46 ms

  5000000050000000

  --------------------------------

  Parallel sum:

  Time taken: 21 ms

  Sum: 5000000050000000

  --------------------------------

  OpenMP sum:

  Time taken: 21 ms

  Sum: 5000000050000000

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复