Loop vectorization gives different answer

回眸只為那壹抹淺笑 提交于 2020-01-04 09:38:10

问题


I am building some unit tests and find that my code gives a slightly different result when vectorized. In my example case below, an array a is summed in one dimension and added to an initial value x. Most elements of a are too small to change x. The code is:

module datamod
   use ISO_FORTRAN_ENV, only : dp => REAL64
   implicit none

   ! -- Array dimensions are large enough for gfortran to vectorize
   integer, parameter :: N = 6
   integer, parameter :: M = 10
   real(dp) :: x(N), a(N,M)

contains
subroutine init_ax
   ! -- Set a and x so the issue manifests

   x = 0.
   x(1) =  0.1e+03_dp

   a = 0.
   ! -- Each negative component is too small to individually change x(1)
   ! -- But the positive component is just big enough
   a(   1,   1) =  -0.4e-14_dp
   a(   1,   2) =  -0.4e-14_dp
   a(   1,   3) =  -0.4e-14_dp
   a(   1,   4) =   0.8e-14_dp
   a(   1,   5) =  -0.4e-14_dp
end subroutine init_ax
end module datamod

program main
   use datamod, only : a, x, N, M, init_ax
   implicit none
   integer :: i, j

   call init_ax

   ! -- The loop in question
   do i=1,N
      do j=1,M
         x(i) = x(i) + a(i,j)
      enddo
   enddo

   write(*,'(a,e26.18)') 'x(1) is: ', x(1)
end program main

The code gives the following results in gfortran without and with loop vectorization. Note that ftree-vectorize is included in -O3, so the problem manifests when using -O3 also.

mach5% gfortran -O2 main.f90 && ./a.out
x(1) is:   0.100000000000000014E+03
mach5% gfortran -O2 -ftree-vectorize main.f90 && ./a.out
x(1) is:   0.999999999999999858E+02

I know that certain compiler options can change the answer, such as -fassociative-math. However, none of those are included in the standard -O3 optimization package according to the gcc optimization options page.

It seems to me as though the vectorized code is adding up all components of a first, and then adding to x. However, this is incorrect because the code as written requires each component of a to be added to x.

What is going on here? May loop vectorization change the answer in some cases? Gfortran versions 4.7 and 5.3 had the same problem, but Intel 16.0 and PGI 15.10 did not.


回答1:


I copied the code you provided (to a file called test.f90) and then I compiled and ran it using version 4.8.5 of gfortran. I found that results from the -O2 and -O2 -ftree-vectorize options differ just as your results differ. However, when I simply used -O3, I found that the results matched -O2.

$ gfortran --version
GNU Fortran (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

$ gfortran -O2 test.f90 && ./a.out
x(1) is:   0.100000000000000014E+03
$ gfortran -O2 -ftree-vectorize test.f90 && ./a.out
x(1) is:   0.999999999999999858E+02
$ gfortran -O3 test.f90 && ./a.out
x(1) is:   0.100000000000000014E+03


来源:https://stackoverflow.com/questions/40702270/loop-vectorization-gives-different-answer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!