Sum a variable over all threads in a CUDA Kernel and return it to Host
问题 I new in cuda and I'm try to implement a Kernel to calculate the energy of my Metropolis Monte Carlo Simulation. I'll put here the linear version of this function: float calc_energy(struct frame frm, float L, float rc){ int i,j; float E=0, rij, dx, dy, dz; for(i=0; i<frm.natm; i++) { for(j=i+1; j<frm.natm; j++) { dx = fabs(frm.conf[j][0] - frm.conf[i][0]); dy = fabs(frm.conf[j][1] - frm.conf[i][1]); dz = fabs(frm.conf[j][2] - frm.conf[i][2]); dx = dx - round(dx/L)*L; dy = dy - round(dy/L)*L;