I\'ve always assumed, that numpy uses a kind of pairwise-summation, which ensures high precision also for float32
- operations:
import numpy as
I don't really have an explanation but it seems related to the memory layout. Using fortran order instead of the default C order I get the desired output.
>>> np.ones((N,2),dtype=np.float32, order='C').sum(axis=0)
array([16777216., 16777216.], dtype=float32)
>>> np.ones((N,2),dtype=np.float32, order='F').sum(axis=0)
array([17000000., 17000000.], dtype=float32)