问题
I'm trying to to compute pressure tensor of a crystal structure. To do so, I have to go throught all pair of particle like in the simplify code below
do i=1, atom_number ! sum over atoms i
type1 = ATOMS(i)
do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list
j = LIST(nj)
type2 = ATOMS(j)
call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
call distance_sqr2(i,j,r,VECT_R)
call gettensor_rij(i,j,T)
r = sqrt(r)
if (r<coub_cutoff) then
local_sum_real(id+1) = local_sum_real(id+1) + ( erfc(alpha_ewald*r)
derivative = -( erfc(alpha_ewald*r)
ff = - (1.0d0/r)*derivative
STRESS_EWALDD = STRESS_EWALDD + ff * T ! A 3x3 matrix
FDX(i) = FDX(i) + VECT_R(1) * ff
FDY(i) = FDY(i) + VECT_R(2) * ff
FDZ(i) = FDZ(i) + VECT_R(3) * ff
FDX(j) = FDX(j) - VECT_R(1) * ff
FDY(j) = FDY(j) - VECT_R(2) * ff
FDZ(j) = FDZ(j) - VECT_R(3) * ff
end if
end do ! sum over atoms j
sum_kin = sum_kin + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)
call gettensor_v(i,Q)
STRESS_KINETIC = STRESS_KINETIC + ATMMASS(i) * Q
end do
I tried to parallelized this double loop with the "paralle do" directive but got some issue with the tensor STRESS_EWALDD and the forces FDX .... I therefore tried to assign manualy to each thread a number of particles like in the follwing code but still get the wrong tensor value.
!$omp parallel private(id,j,i,nj,type1,type2,T,Q,r,VECT_R,derivative,Aab,Bab,Cab,Dab,ff)
id = omp_get_thread_num()
do i=id+1, atom_number,nthreads ! sum over atoms i
type1 = ATOMS(i)
do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list
j = LIST(nj)
type2 = ATOMS(j)
call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
call distance_sqr2(i,j,r,VECT_R)
call gettensor_rij(i,j,T)
r = sqrt(r)
if (r<coub_cutoff) then
local_sum_real(id+1) = local_sum_real(id+1) + ( erfc(alpha_ewald*r)
derivative = -( erfc(alpha_ewald*r)
ff = - (1.0d0/r)*derivative
local_STRESS_EWALDD(id+1,:,:) = local_STRESS_EWALDD(id+1,:,:) + ff * T
FDX(i) = FDX(i) + VECT_R(1) * ff
FDY(i) = FDY(i) + VECT_R(2) * ff
FDZ(i) = FDZ(i) + VECT_R(3) * ff
FDX(j) = FDX(j) - VECT_R(1) * ff
FDY(j) = FDY(j) - VECT_R(2) * ff
FDZ(j) = FDZ(j) - VECT_R(3) * ff
end if
end do ! sum over atoms j
local_sum_kin(id+1) = local_sum_kin(id+1) + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)
call gettensor_v(i,Q)
local_STRESS_KINETIC(id+1,:,:) = local_STRESS_KINETIC(id+1,:,:) + ATMMASS(i) * Q
end do ! sum over atoms i
!$omp end parallel
do i=1,nthreads
sum_real = sum_real + local_sum_real(i)
sum_virial_buck = sum_virial_buck + local_sum_virial_buck(i)
STRESS_BUCK = STRESS_BUCK + local_STRESS_BUCK(i,:,:)
STRESS_EWALDD = STRESS_EWALDD + local_STRESS_EWALDD(i,:,:)
sum_buck = sum_buck + local_sum_buck(i)
sum_kin = sum_kin + local_sum_kin(i)
STRESS_KINETIC = STRESS_KINETIC + local_STRESS_KINETIC(i,:,:)
end do
Scalars and STRESS_KINETIC values are correct but the STRESS_EWALDD is wrong and I can't figure out why. I've no idea about forces so far. So I'd really appreciate some hit here. Thanks,
Éric.
回答1:
You have taken a somewhat unorthodox approach to using OpenMP.
OpenMP provides the reduction(op:vars)
clause that performs reduction with op
over local values of the variables in vars
and you should use it for STRESS_EWALD
, sum_kin
, sum_real
and STRESS_KINETIC
. Force accumulation for the i-th atom should work since atoms in the Verlet list POINT
are ordered (you took the code that builds it from the book from Allen & Tildesley, right?) but not for the j-th atom. That's why you should perform reduction on them too. Don't worry if you read in some older OpenMP manual that reduction variables have to be scalar - newer OpenMP versions support reduction over array variables in Fortran 90+.
!$OMP PARALLEL DO PRIVATE(....)
!$OMP& REDUCTION(+:FDX,FDY,FDZ,sum_kin,sum_real,STRESS_EWALDD,STRESS_KINETIC)
!$OMP& SCHEDULE(DYNAMIC)
do i=1, atom_number ! sum over atoms i
type1 = ATOMS(i)
do nj=POINT(i), POINT(i+1)-1 ! sum over atoms j of i's atoms list
j = LIST(nj)
type2 = ATOMS(j)
call get_ff_param(type1,type2,Aab,Bab,Cab,Dab)
call distance_sqr2(i,j,r,VECT_R)
call gettensor_rij(i,j,T)
r = sqrt(r)
if (r<coub_cutoff) then
sum_real = sum_real + ( erfc(alpha_ewald*r)
derivative = -( erfc(alpha_ewald*r)
ff = - (1.0d0/r)*derivative
STRESS_EWALDD = STRESS_EWALDD + ff * T ! A 3x3 matrix
FDX(i) = FDX(i) + VECT_R(1) * ff
FDY(i) = FDY(i) + VECT_R(2) * ff
FDZ(i) = FDZ(i) + VECT_R(3) * ff
FDX(j) = FDX(j) - VECT_R(1) * ff
FDY(j) = FDY(j) - VECT_R(2) * ff
FDZ(j) = FDZ(j) - VECT_R(3) * ff
end if
end do ! sum over atoms j
sum_kin = sum_kin + ATMMASS(i) * (VX(i)**2 + VY(i)**2 + VZ(i)**2)
call gettensor_v(i,Q)
STRESS_KINETIC = STRESS_KINETIC + ATMMASS(i) * Q
end do
!$OMP END PARALLEL DO
Using dynamic loop scheduling will reduce the load imbalance when the number of neighbours to each atom varies wildly.
来源:https://stackoverflow.com/questions/10811302/openmp-and-array-summation-with-fortran-90