The question may sound basic but, I could not find any concrete answer to this. So now say we have a multicore processor like a corei5 680 (2 physical cores and with HT enabled
It depends a bit which version/features of OpenMP you're considering as I believe later versions may give you more features but the original library was build around data parallel for
primatives. In general OpenMP and other data parallel programming models try and abstract away the underlying hardware and the programmer declares their computation as a series of operations on data that are then scheduled by OMP.
To answer your first question the OS scheduler will schedule threads across cores, the OMP scheduler will schedule work across the available threads.
#pragma omp parallel for
for (i = 0; i < N; i++)
a[i] = 2 * i;
The OMP scheduler will choose which cores (real or HT) to use depending on a number of factors including their load, the amount of work being given to it and any hints you might have provided. One would expect the code above to run on all the available cores (4 in your example)
You can use the schedule
keyword to control how the scheduler allocates work.
schedule(type, chunk): This is useful if the work sharing construct is a do-loop or for-loop. The iteration(s) in the work sharing construct are assigned to threads according to the scheduling method defined by this clause. The three types of scheduling are:
static: Here, all the threads are allocated iterations before they execute the loop iterations. The iterations are divided among threads equally by default. However, specifying an integer for the parameter chunk will allocate chunk number of contiguous iterations to a particular thread.
dynamic: Here, some of the iterations are allocated to a smaller number of threads. Once a particular thread finishes its allocated iteration, it returns to get another one from the iterations that are left. The parameter chunk defines the number of contiguous iterations that are allocated to a thread at a time.
guided: A large chunk of contiguous iterations are allocated to each thread dynamically (as above). The chunk size decreases exponentially with each successive allocation to a minimum size specified in the parameter chunk
From Wikipedia
To address your second question. You can also use the num_threads
attribute to specify the number of threads to be used. Adding the following above the #pragma omp parallel for
in the example would limit OMP to three threads, regardless of whether more were available.
#pragma omp parallel num_threads(3)
#pragma omp for
for (i = 0; i < N; i++)
a[i] = 2 * i;
It is also possible to control to some extent how work is schedule across different processors in a multi-processor (more than one socket) system. OpenMP and NUMA relation?
You might also find the following guide useful, Guide into OpenMP: Easy multithreading programming for C++.