问题
I wrote (for my class in Numerical Methods for Theoretical Physics) a very simple program for a Random Walk in dimension 2. Here it is:
program random_walk
implicit none
integer, parameter :: Nwalker = 1000000
integer, parameter :: Nstep = 100
integer, parameter :: Nmeas = 10
integer :: posx, posy, move
integer :: is, im, iw
real :: start_time, stop_time
double precision, dimension(Nmeas) :: dist, r2
real :: rnd
do im = 1, Nmeas
dist(im) = im*Nstep
r2(im) = 0.0
end do
call cpu_time(start_time)
do iw = 1, Nwalker
posx = 0
posy = 0
do im = 1, Nmeas
do is = 1, Nstep
call random_number(rnd)
move = 4*rnd
if (move == 0) posx = posx + 1
if (move == 1) posy = posy + 1
if (move == 2) posx = posx - 1
if (move == 3) posy = posy - 1
end do
r2(im) = r2(im) + posx**2 + posy**2
end do
end do
r2 = r2 / Nwalker
call cpu_time(stop_time)
do im = 1, Nmeas
print '(f8.6, " ", f8.6)', log(dist(im)), log(r2(im))
end do
print '("Time = ", f6.3, " seconds")', stop_time - start_time
end program
In the end it should print 10 rows 2 columns: first column is the logarithm of "time" (number of steps), second column is the logarithm of the average squared distance from the origin. The second column "on average" should be equal to the first. So far so good, the program is working well, results are very reasonable. But here the problem; on my macbookpro (2,7 GHz Intel Core i7, compiler gfortran 7.1.0, optimization -O2) it tooks on average more than 20 seconds to run. But if I comment out these lines:
! do im = 1, Nmeas
! print '(f8.6, " ", f8.6)', log(dist(im)), log(r2(im))
! end do
which are beyond of "stop_time" computation, the result is that the running time is... less than 6 seconds!?
How is it possible?
回答1:
This is quite a typical thing to observe. People hit this when they create artificial computations which only test performance and do not create a useful result. When the result is not printed, the compiler can recognize that it does not need the result for the program output and may completely omit the computation.
To examine it you can add the -fdump-tree-optimized
flag to get a special source form called GIMPLE and you can compare the output for those two variants of the source code. It writes the output to a file called yourfilename.f90.something.optimized
. I can indeed see a big part missing. Basically the whole r2
array and the operations with it are optimized out. You can also compare the generated assembly if you know that better.
来源:https://stackoverflow.com/questions/47583729/a-fortran-timing-issue-i-cannot-understand