what causes different in array sum along axis for C versus F ordered arrays in numpy

后端 未结 2 1232
小鲜肉
小鲜肉 2021-01-15 01:41

I am curious if anyone can explain what exactly leads to the discrepancy in this particular handling of C versus Fortran ordered arrays in numpy. See the code

相关标签:
2条回答
  • 2021-01-15 01:56

    Floating point math isn't necessarily associative, i.e. (a+b)+c != a+(b+c).

    Since you're adding along different axes, the order of operations is different, which can affect the final result. As a simple example, consider the matrix whose sum is 1.

    a = np.array([[1e100, 1], [-1e100, 0]])
    print(a.sum())   # returns 0, the incorrect result
    af = np.asfortranarray(a)
    print(af.sum())  # prints 1
    

    (Interestingly, a.T.sum() still gives 0, as does aT = a.T; aT.sum() , so I'm not sure how exactly this is implemented in the backend)

    The C order is using the sequence of operations (left-to-right) 1e100 + 1 + (-1e100) + 0 whereas the Fortran order uses 1e100 + (-1e100) + 1 + 0. The problem is that (1e100+1) == 1e100 because floats don't have enough precision to represent that small difference, so the 1 gets lost.

    In general, don't do equality testing on floating point numbers, instead compare using a small epsilon (if abs(float1 - float2) < 0.00001 or np.isclose). If you need arbitrary float precision, use the Decimal library or fixed-point representation and ints.

    0 讨论(0)
  • 2021-01-15 02:12

    This is almost certainly a consequence of numpy sometimes using pairwise summation and sometimes not.

    Let's build a diagnostic array:

    eps = (np.nextafter(1.0, 2)-1.0) / 2
    1+eps+eps+eps
    # 1.0
    (1+eps)+(eps+eps)
    # 1.0000000000000002
    
    X = np.full((32, 32), eps)
    X[0, 0] = 1
    X.sum(0)[0]
    # 1.0
    X.sum(1)[0]
    # 1.000000000000003
    X[:, 0].sum()
    # 1.000000000000003
    

    This strongly suggests that 1D arrays and contiguous axes use pairwise summation while strided axes in a multidimensional array don't.

    Note that to see that effect the array has to be large enough, otherwise numpy falls back to ordinary summation.

    0 讨论(0)
提交回复
热议问题