Matlab limits TBB but not OpenMP

拜拜、爱过 提交于 2019-12-05 16:42:10

Sorry it took so long to answer. Specifying deferred just keeps the task scheduler from creating the thread pool until the first parallel construct starts. By default, the number of threads is automatic, which corresponds to the number of cores (the code setting this is in src/tbb/tbb_misc_ex.cpp, and also depends on CPU affinity among other things. See initialize_hardware_concurrency_info())

I modified your code slightly:

#include "tbb/parallel_for_each.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/atomic.h"
#include "tbb/spin_mutex.h"
#include <iostream>
#include <vector>

// If LOW_THREAD == 0, run with task_scheduler_init(automatic), which is the number
// of cores available.  If 1, start with 1 thread.

#ifndef NTASKS
#define NTASKS 50
#endif
#ifndef MAXWORK
#define MAXWORK 400000000L
#endif
#ifndef LOW_THREAD
#define LOW_THREAD 0  // 0 == automatic
#endif

tbb::atomic<size_t> cur_par;
tbb::atomic<size_t> max_par;

#if PRINT_OUTPUT
tbb::spin_mutex print_mutex;
#endif

struct mytask {
  mytask(size_t n) :_n(n) {}
  void operator()() {
      size_t my_par = ++cur_par;
      size_t my_old = max_par;
      while( my_old < cur_par) { my_old = max_par.compare_and_swap(my_par, my_old); }

      for (long i=0;i<MAXWORK;++i) {}  // Deliberately run slow
#if PRINT_OUTPUT
      {
          tbb::spin_mutex::scoped_lock s(print_mutex);
          std::cerr << "[" << _n << "]";
      }
#endif
      --cur_par;
  }
  size_t _n;
};

template <typename T> struct invoker {
  void operator()(T& it) const {it();}
};

void mexFunction(/*int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[]*/) {

    for( size_t thr = LOW_THREAD; thr <= 128; thr = thr ? thr * 2: 1) {
        cur_par = max_par = 0;
        tbb::task_scheduler_init init(thr == 0 ? (unsigned int)tbb::task_scheduler_init::automatic : thr);

        std::vector<mytask> tasks;
        for (int i=0;i<NTASKS;++i) tasks.push_back(mytask(i));

        tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());
        std::cout << " for thr == ";
        if(thr) std::cout << thr; else std::cout << "automatic";
        std::cout << ", maximum parallelism == " << (size_t)max_par << std::endl;
    }
}

int main() {
    mexFunction();
}

I ran this on a 16-core system here:

for thr == automatic, maximum parallelism == 16
for thr == 1, maximum parallelism == 1
for thr == 2, maximum parallelism == 2
for thr == 4, maximum parallelism == 4
for thr == 8, maximum parallelism == 8
for thr == 16, maximum parallelism == 16
for thr == 32, maximum parallelism == 32
for thr == 64, maximum parallelism == 50
for thr == 128, maximum parallelism == 50

The limit of 50 is the total number of tasks created by the program.

The threads created by TBB are shared by the parallel constructs started by the program, so if you have two parallel for_each running simultaneously, the maximum number of threads will not change; each for_each will run more-slowly. The TBB library does not control the number of threads used in OpenMP constructs, so an OpenMP parallel_for and a TBB parallel_for_each will generally oversubscribe the machine.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!