I have a loop which updates a matrix A and I want to make it openmp but I\'m not sure what variables should be shared and private. I would have thought just ii and jj would work
As written you will need some synchronisation to avoid a race condition. Consider the 2 thread case. Say thread 0 start with ii=1, and so considers jj=2,3,4, .... and thread 1 starts with ii=2, and so considers jj=3,4,5,6. Thus as written it is possible that thread 0 is considering ii=1,jj=3 and thread 1 is looking at ii=2,jj=3 at the same time. This obviously could cause problems at the line
A(jj,:)=A(jj,:)+(M(ii)/coff)*(distance_vector)
as both threads have the same value of jj. So yes, you do need to synchronize the updates to A to avoid a race, though I must admit I good way isn't immediately obvious to me. I'll think on it and edit if something occurs to me.
However I've got 3 other comments:
1) Your memory access pattern is horrible, and correcting this will, I expect, give at least as much speed up as any openmp with a lot less hassle. In Fortran you want to go down the first index fastest - this makes sure that memory accesses are spatially local and so ensures good use of the memory hierarchy. Given that this is the most important thing for good performance on a modern machine you should really try to get this right. So the above would be better if you can arrange the arrays so that the above can be written as
do ii=1,N-1
do jj=ii+1,N
distance_vector=X(:,ii)-X(:jj)
distance2=sum(distance_vector*distance_vector)
distance=DSQRT(distance2)
coff=distance*distance*distance
PE=PE-M(II)*M(JJ)/distance
A(:,jj)=A(:,jj)+(M(ii)/coff)*(distance_vector)
A(:,ii)=A(:,ii)-(M(jj)/coff)*(distance_vector)
end do
end do
Note how this goes down the first index, rather than the second as you have.
2) If you do use openmp I strongly suggest you use default(None), it helps avoid nasty bugs. If you were one of my students you'd lose loads of marks for not doing this!
3) Dsqrt is archaic - in modern Fortran (i.e. anything after 1967) in all but a few obscure cases sqrt is plenty good enough, and is more flexible