Memory consumption of NumPy function for standard deviation

后端未结

关注

 2  1394

I\'m currently using the Python bindings of GDAL to work on quite large raster data sets (> 4 GB). Since loading them into memory at once is no feasible solution for me I re

相关标签:

2条回答

闹比i

2021-01-11 11:55

I doubt you will find any such functions in numpy. The raison d'être of numpy is that it takes advantage of vector processor instruction sets -- performing the same instruction of large amounts of data. Basically numpy trades memory efficiency for speed efficiency. However, due to the memory intensive nature of Python, numpy is also able to achieve certain memory efficiencies by associating the data type with the array as a whole and not each individual element.

One way to improve the speed, but still sacrifice some memory overhead is calculate the standard deviation in chunks eg.

import numpy as np

def std(arr, blocksize=1000000):
    """Written for py3, change range to xrange for py2.
    This implementation requires the entire array in memory, but it shows how you can
    calculate the standard deviation in a piecemeal way.
    """
    num_blocks, remainder = divmod(len(arr), blocksize)
    mean = arr.mean()
    tmp = np.empty(blocksize, dtype=float)
    total_squares = 0
    for start in range(0, blocksize*num_blocks, blocksize):
        # get a view of the data we want -- views do not "own" the data they point to
        # -- they have minimal memory overhead
        view = arr[start:start+blocksize]
        # # inplace operations prevent a new array from being created
        np.subtract(view, mean, out=tmp)
        tmp *= tmp
        total_squares += tmp.sum()
    if remainder:
        # len(arr) % blocksize != 0 and need process last part of array
        # create copy of view, with the smallest amount of new memory allocation possible
        # -- one more array *view*
        view = arr[-remainder:]
        tmp = tmp[-remainder:]
        np.subtract(view, mean, out=tmp)
        tmp *= tmp
        total_squares += tmp.sum()

    var = total_squares / len(arr)
    sd = var ** 0.5
    return sd

a = np.arange(20e6)
assert np.isclose(np.std(a), std(a))

Showing the speed up --- the larger the blocksize, the larger the speed up. And considerably lower memory overhead. Not entirely the lower memory overhead is 100% accurate.

In [70]: %timeit np.std(a)
10 loops, best of 3: 105 ms per loop

In [71]: %timeit std(a, blocksize=4096)
10 loops, best of 3: 160 ms per loop

In [72]: %timeit std(a, blocksize=1000000)
10 loops, best of 3: 105 ms per loop

In [73]: %memit std(a, blocksize=4096)
peak memory: 360.11 MiB, increment: 0.00 MiB

In [74]: %memit std(a, blocksize=1000000)
peak memory: 360.11 MiB, increment: 0.00 MiB

In [75]: %memit np.std(a)
peak memory: 512.70 MiB, increment: 152.59 MiB

0 讨论(0)

北荒

2021-01-11 12:10

Cython to the rescue! This achieves a nice speed up:

%%cython
cimport cython
cimport numpy as np
from libc.math cimport sqrt

@cython.boundscheck(False)
def std_welford(np.ndarray[np.float64_t, ndim=1] a):
    cdef int n = 0
    cdef float mean = 0
    cdef float M2 = 0
    cdef int a_len = len(a)
    cdef int i
    cdef float delta
    cdef float result
    for i in range(a_len):
        n += 1
        delta = a[i] - mean
        mean += delta / n
        M2 += delta * (a[i] - mean)
    if n < 2:
        result = np.nan
        return result
    else:
        result = sqrt(M2 / (n - 1))
        return result

Using this to test:

a = np.random.rand(10000).astype(np.float)
print std_welford(a)
%timeit -n 10 -r 10 std_welford(a)

Cython code

0.288327455521
10 loops, best of 10: 59.6 µs per loop

Original code

0.289605617397
10 loops, best of 10: 18.5 ms per loop

Numpy std

0.289493223504
10 loops, best of 10: 29.3 µs per loop

So a speed increase of around 300x. Still not as good as the numpy version..

0 讨论(0)