tbb

OpenMP overhead

女生的网名这么多〃 提交于 2019-12-24 00:52:35
问题 I have parallelized image convolution and lu factorization using OpenMP and Intel TBB. I am testing it on 1-8 cores. But when I try it on 1 core in OPenMP and TBB by specifying one thread using set_num_threads(1), and task_scheduler_init InitTBB(1) respectively for example; TBB performance shows some small degradation compared to sequential code due to TBB overhead, but surprisingly OpenMP doesnt show any overhead on single core and performs exactly equal to sequential code (using Intel O3

Getting Thread Building Blocks (Intel TBB) running on Linux with gcc

落爺英雄遲暮 提交于 2019-12-23 14:57:24
问题 I'm trying to build some tests for threading building blocks. Unfortunately, I'm unable to configure the tbb library. The linker cannot find the library tbb. I've tried running the scripts in bin directory which has not helped. I've even tried moving the library files to /usr/local/lib/ which was again a flop. Any suggestions would be helpful. 回答1: Determine where you have put the tbb/lib folder, and then add the path to the library to the LD_LIBRARY_PATH environment variable, either manually

TBB possible memory leak

℡╲_俬逩灬. 提交于 2019-12-23 14:56:50
问题 Test program: #include <tbb/parallel_invoke.h> int main(void) { tbb::parallel_invoke([]{},[]{}); return 0; } Compiled using g++ -std=c++11 tmp.cpp -ltbb Checked with valgrind --tool=memcheck --track-origins=yes \ --leak-check=full --log-file=report ./a.out` libtbb version: 4.0 , valgrind version: 3.8.1 . Part of the above test result: possibly lost: 1,980 bytes in 6 blocks Question is: Is this a TBB bug? Or is this possible lost actually safe, it's just some codes that valgrind does not

OpenCV TBB IPP OpenMP functions

狂风中的少年 提交于 2019-12-23 10:08:01
问题 Is there a list of functions/methods of OpenCV that have been optimized with IPP and/or TBB and/or OpenMP? 回答1: Disclaimer: I have no experience in OpenCV usage. I found no such a list on the official opencv.org site. However, the ChangeLog says: switched all the remaining parallel loops from TBB-only tbb::parallel_for() to universal cv::parallel_for_() with many possible backends (MS Concurrency, Apple's GDC, OpenMP, Intel TBB etc.) Now, we know what to search and grep -IRl parallel_for_

A parallel algorithm for order-preserving selection from an index table

限于喜欢 提交于 2019-12-23 03:40:39
问题 Order-preserving selection from an index table is trivial in serial code, but in multi-threading is less straightforward, in particular if one wants to retain efficiency (the whole point of multi-threading) by avoiding linked lists. Consider the serial code template<typename T> std::vector<T> select_in_order( std::vector<std::size_t> const&keys, // permutation of 0 ... key.size()-1 std::vector<T> const&data) // anything copyable { // select data[keys[i]] allowing keys.size() >= data.size()

How to get return value from a function called which executes in another thread in TBB?

不问归期 提交于 2019-12-22 09:14:27
问题 In the code: #include <tbb/tbb.h> int GetSomething() { int something; // do something return something; } // ... tbb::tbb_thread(GetSomething, NULL); // ... Here GetSomething() was called in another thread via its pointer. But can we get return value from GetSomething() ? How? 回答1: If you are bound C++03 and tbb you have to use Outputarguments, which means that you have to rewrite your function. e.g.: void GetSomething(int* out_ptr); int var = 23; tbb::tbb:thread(GetSomething, &var); // pay

Debugging in threading building Blocks

≯℡__Kan透↙ 提交于 2019-12-20 05:45:10
问题 I would like to program in threading building blocks with tasks. But how does one do the debugging in practice? In general the print method is a solid technique for debugging programs. In my experience with MPI parallelization, the right way to do logging is that each thread print its debugging information in its own file (say "debug_irank" with irank the rank in the MPI_COMM_WORLD) so that the logical errors can be found. How can something similar be achieved with TBB? It is not clear how to

Using malloc instead of new, and calling the copy constructor when the object is created

自作多情 提交于 2019-12-19 09:03:16
问题 I wanted to try out TBB's scalable_allocator, but was confused when I had to replace some of my code. This is how allocation is done with the allocator: SomeClass* s = scalable_allocator<SomeClass>().allocate( sizeof(SomeClass) ); EDIT: What's shown above is not how allocation is done with scalable_allocator. As ymett correctly mentioned, allocation is done like this: int numberOfObjectsToAllocateFor = 1; SomeClass* s = scalable_allocator<SomeClass>().allocate( numberOfObjectsToAllocateFor );

AMD multi-core programming

前提是你 提交于 2019-12-19 07:42:41
问题 I want to start to write applications(C++) that will utilize the additional cores to execute portions of the code that have a need to perform lots of calculations and whose computations are independent of each other. I have the following processor : x64 Family 15 Model 104 Stepping 2 Authentic AMD ~1900 Mhz running on Windows Vista Home premium 32 bit and Opensuse 11.0 64 bit. On the Intel platforms , I've used the following APIs Intel TBB, OpenMP. Do they work on AMD and does AMD have

AMD multi-core programming

梦想与她 提交于 2019-12-19 07:42:21
问题 I want to start to write applications(C++) that will utilize the additional cores to execute portions of the code that have a need to perform lots of calculations and whose computations are independent of each other. I have the following processor : x64 Family 15 Model 104 Stepping 2 Authentic AMD ~1900 Mhz running on Windows Vista Home premium 32 bit and Opensuse 11.0 64 bit. On the Intel platforms , I've used the following APIs Intel TBB, OpenMP. Do they work on AMD and does AMD have