Why is this C++11 code containing rand() slower with multiple threads than with one?

前端 未结 4 1802
栀梦
栀梦 2020-12-13 06:17

I\'m trying around on the new C++11 threads, but my simple test has abysmal multicore performance. As a simple example, this program adds up some squared random numbers.

相关标签:
4条回答
  • 2020-12-13 06:39

    On my system the behavior is same, but as Maxim mentioned, rand is not thread safe. When I change rand to rand_r, then the multi threaded code is faster as expected.

    void add_multi(int N, double& result) {
    double sum=0;
    unsigned int seed = time(NULL);
    for (int i = 0; i < N; ++i){
        sum+= sqrt(1.0*rand_r(&seed)/RAND_MAX);
    }
    result = sum/N;
    }
    
    0 讨论(0)
  • 2020-12-13 06:39

    To make this faster, use a thread pool pattern.

    This will let you enqueue tasks in other threads without the overhead of creating a std::thread each time you want to use more than one thread.

    Don't count the overhead of setting up the queue in your performance metrics, just the time to enqueue and extract the results.

    Create a set of threads and a queue of tasks (a structure containing a std::function<void()>) to feed them. The threads wait on the queue for new tasks to do, do them, then wait on new tasks.

    The tasks are responsible for communicating their "done-ness" back to the calling context, such as via a std::future<>. The code that lets you enqueue functions into the task queue might do this wrapping for you, ie this signature:

    template<typename R=void>
    std::future<R> enqueue( std::function<R()> f ) {
      std::packaged_task<R()> task(f);
      std::future<R> retval = task.get_future();
      this->add_to_queue( std::move( task ) ); // if we had move semantics, could be easier
      return retval;
    }
    

    which turns a naked std::function returning R into a nullary packaged_task, then adds that to the tasks queue. Note that the tasks queue needs be move-aware, because packaged_task is move-only.

    Note 1: I am not all that familiar with std::future, so the above could be in error.

    Note 2: If tasks put into the above described queue are dependent on each other for intermediate results, the queue could deadlock, because no provision to "reclaim" threads that are blocked and execute new code is described. However, "naked computation" non-blocking tasks should work fine with the above model.

    0 讨论(0)
  • 2020-12-13 06:45

    The time needed to execute the program is very small (33msec). This means that the overhead to create and handle several threads may be more than the real benefit. Try using programs that need longer times for the execution (e.g., 10 sec).

    0 讨论(0)
  • 2020-12-13 06:55

    As you discovered, rand is the culprit here.

    For those who are curious, it's possible that this behavior comes from your implementation of rand using a mutex for thread safety.

    For example, eglibc defines rand in terms of __random, which is defined as:

    long int
    __random ()
    {
      int32_t retval;
    
      __libc_lock_lock (lock);
    
      (void) __random_r (&unsafe_state, &retval);
    
      __libc_lock_unlock (lock);
    
      return retval;
    }
    

    This kind of locking would force multiple threads to run serially, resulting in lower performance.

    0 讨论(0)
提交回复
热议问题