OpenMP overhead
问题 I have parallelized image convolution and lu factorization using OpenMP and Intel TBB. I am testing it on 1-8 cores. But when I try it on 1 core in OPenMP and TBB by specifying one thread using set_num_threads(1), and task_scheduler_init InitTBB(1) respectively for example; TBB performance shows some small degradation compared to sequential code due to TBB overhead, but surprisingly OpenMP doesnt show any overhead on single core and performs exactly equal to sequential code (using Intel O3