I am curious if anyone can explain what exactly leads to the discrepancy in this particular handling of C versus Fortran ordered arrays in numpy
. See the code
Floating point math isn't necessarily associative, i.e. (a+b)+c != a+(b+c)
.
Since you're adding along different axes, the order of operations is different, which can affect the final result. As a simple example, consider the matrix whose sum is 1.
a = np.array([[1e100, 1], [-1e100, 0]])
print(a.sum()) # returns 0, the incorrect result
af = np.asfortranarray(a)
print(af.sum()) # prints 1
(Interestingly, a.T.sum()
still gives 0, as does aT = a.T; aT.sum()
, so I'm not sure how exactly this is implemented in the backend)
The C order is using the sequence of operations (left-to-right) 1e100 + 1 + (-1e100) + 0
whereas the Fortran order uses 1e100 + (-1e100) + 1 + 0
. The problem is that (1e100+1) == 1e100
because floats don't have enough precision to represent that small difference, so the 1
gets lost.
In general, don't do equality testing on floating point numbers, instead compare using a small epsilon (if abs(float1 - float2) < 0.00001
or np.isclose
). If you need arbitrary float precision, use the Decimal
library or fixed-point representation and int
s.