I recently wrote a small number-crunching program that basically loops over an N-dimensional grid and performs some calculation at each point.
for (int i1 =
The first approach is simple. It is also sufficient if you expect that the load will be balanced evenly over the threads. In some cases, especially if the complexity of bin_index is very dependant on the parameter values, one of the threads could end up with a much heavier task than the rest. Remember: the task is finished when the last threads finishes.
The second approach is a bit more complicated, but balances the load more evenly if the tasks are finegrained enough (the number of tasks is much larger than the number of threads).
Note that you may have issues putting the calculations in separate threads. Make sure that bin_index works correctly when multiple threads execute it simultaneously. Beware of the use of global or static variables for intermediate results.
Also, "histogram[bin_index(i1, i2, i3, i4)] += 1" could be interrupted by another thread, causing the result to be incorrect (if the assignment fetches the value, increments it and stores the resulting value in the array). You could introduce a local histogram for each thread and combine the results to a single histogram when all threads have finished. You could also make sure that only one thread is modifying the histogram at the same time, but that may cause the threads to block each other most of the time.