OpenMP overhead

女生的网名这么多〃 提交于 2019-12-24 00:52:35

问题


I have parallelized image convolution and lu factorization using OpenMP and Intel TBB. I am testing it on 1-8 cores. But when I try it on 1 core in OPenMP and TBB by specifying one thread using set_num_threads(1), and task_scheduler_init InitTBB(1) respectively for example; TBB performance shows some small degradation compared to sequential code due to TBB overhead, but surprisingly OpenMP doesnt show any overhead on single core and performs exactly equal to sequential code (using Intel O3 optimization level). I am using static scheduling of OpenMP loops. Is it realistic or am I doing some mistake ?


回答1:


The OpenMP runtime will probably not create any threads if you run it with just one thread.

Also, just using OpenMP parallelization directives sometimes makes also serial code run faster as you are essentially giving the compiler more information. A work-sharing construct, for example, tells the compiler that the iterations of the loop are independent of each other, which it might not have been able to deduce on its own and which allows the compiler to use more aggressive optimization strategies. Not always, of course, but I have seen it happen with "real world code".




回答2:


OpenMP is something where the compiler does all the work. If the compiler knows it's going to be serial code always it can quite legitimately skip all of the parallel bits.

TBB as I understand it is basically just a library. It is always going to have to have your algorithm decorated with the necessary parts to run it in parallel as well as serially.




回答3:


OpenMP forks a decorated part (#pragma omg for/parallel) of the code into a main thread (that would also be executed without OpenMP) and additional threads.

If you configure to only use 1 thread, then this is only the main thread, executed as it would be without the OpenMP directive. There is no overhead, cause the execution path wasn't forked.




回答4:


The thing about OpenMP is that the compiler does the work for you, it requires minimum modification to the sequential code and often give somewhat good results if the tasks given to each thread are quite large. I would suggest to try to test your code using Pthread or thread on c++11 and see the results.



来源:https://stackoverflow.com/questions/7301317/openmp-overhead

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!