OPENMP F90/95 Nested DO loops - problems getting improvement over serial implementation

后端未结

关注

 2  924

I\'ve done some searching but couldn\'t find anything that appeared to be related to my question (sorry if my question is redundant!). Anyway, as the title states, I\'m hav

相关标签:

2条回答

难免孤独

2021-01-14 09:05

What you have is a convolution. This can be done with a Fast Fourier Transform in N log2(N) time. Your algorithm is N^2. If you use FFT, one core will probably be enough!

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2021-01-14 09:14
The obvious place to put the omp pragma is at the very outside loop.

For every (l,m,n), you're calculating a convolution between your perturbed variables and an exponential smoother. Each (l,m,n) calculation is completely independant from the others, so you can put it on the outermost loop. So for instance the simplest thing
```
!$omp parallel do private(n,m,l,i,j,k,exp_smoother) shared(sum_u_pert,sum_v_pert,sum_t_pert,u_p,v_p,t_p), default(none)
do n=1,lm
  do m=1,jm
    do l=1,im
      do k=1,lm
        do j=1,jm
          do i=1,im
            exp_smoother=exp(-(abs(i-l)/hzscl)-(abs(j-m)/hzscl)-(abs(k-n)/vscl))
            sum_u_pert(l,m,n) = sum_u_pert(l,m,n) + u_p(i,j,k) * exp_smoother
            sum_v_pert(l,m,n) = sum_v_pert(l,m,n) + v_p(i,j,k) * exp_smoother
            sum_t_pert(l,m,n) = sum_t_pert(l,m,n) + t_p(i,j,k) * exp_smoother
          end do
        end do
      end do
    end do
  end do
end do
```
gives me a ~6x speedup on 8 cores (using a much reduced problem size of 20x41x41). Given the amount of work there is to do in the loops, even at the smaller size, I assume the reason it's not an 8x speedup involves memory contension or false sharing; for further performance tuning you might want to explicitly break the sum arrays into sub-blocks for each thread, and combine them at the end; but depending on the problem size, having the equivalent of an extra im x jm x lm sized array might not be desirable.

It seems like there's a lot of structure in this problem you aught to be able to explot to speed up even the serial case, but it's easier to say that then to find it; playing around on pen and paper nothing comes to mind in a few minutes, but someone cleverer may spot something.
0 讨论(0)
发布评论:

提交评论
- 加载中...