I\'m currently using the Python bindings of GDAL to work on quite large raster data sets (> 4 GB). Since loading them into memory at once is no feasible solution for me I re
I doubt you will find any such functions in numpy
. The raison d'être of numpy
is that it takes advantage of vector processor instruction sets -- performing the same instruction of large amounts of data. Basically numpy
trades memory efficiency for speed efficiency. However, due to the memory intensive nature of Python, numpy
is also able to achieve certain memory efficiencies by associating the data type with the array as a whole and not each individual element.
One way to improve the speed, but still sacrifice some memory overhead is calculate the standard deviation in chunks eg.
import numpy as np
def std(arr, blocksize=1000000):
"""Written for py3, change range to xrange for py2.
This implementation requires the entire array in memory, but it shows how you can
calculate the standard deviation in a piecemeal way.
"""
num_blocks, remainder = divmod(len(arr), blocksize)
mean = arr.mean()
tmp = np.empty(blocksize, dtype=float)
total_squares = 0
for start in range(0, blocksize*num_blocks, blocksize):
# get a view of the data we want -- views do not "own" the data they point to
# -- they have minimal memory overhead
view = arr[start:start+blocksize]
# # inplace operations prevent a new array from being created
np.subtract(view, mean, out=tmp)
tmp *= tmp
total_squares += tmp.sum()
if remainder:
# len(arr) % blocksize != 0 and need process last part of array
# create copy of view, with the smallest amount of new memory allocation possible
# -- one more array *view*
view = arr[-remainder:]
tmp = tmp[-remainder:]
np.subtract(view, mean, out=tmp)
tmp *= tmp
total_squares += tmp.sum()
var = total_squares / len(arr)
sd = var ** 0.5
return sd
a = np.arange(20e6)
assert np.isclose(np.std(a), std(a))
Showing the speed up --- the larger the blocksize
, the larger the speed up. And considerably lower memory overhead. Not entirely the lower memory overhead is 100% accurate.
In [70]: %timeit np.std(a)
10 loops, best of 3: 105 ms per loop
In [71]: %timeit std(a, blocksize=4096)
10 loops, best of 3: 160 ms per loop
In [72]: %timeit std(a, blocksize=1000000)
10 loops, best of 3: 105 ms per loop
In [73]: %memit std(a, blocksize=4096)
peak memory: 360.11 MiB, increment: 0.00 MiB
In [74]: %memit std(a, blocksize=1000000)
peak memory: 360.11 MiB, increment: 0.00 MiB
In [75]: %memit np.std(a)
peak memory: 512.70 MiB, increment: 152.59 MiB
Cython to the rescue! This achieves a nice speed up:
%%cython
cimport cython
cimport numpy as np
from libc.math cimport sqrt
@cython.boundscheck(False)
def std_welford(np.ndarray[np.float64_t, ndim=1] a):
cdef int n = 0
cdef float mean = 0
cdef float M2 = 0
cdef int a_len = len(a)
cdef int i
cdef float delta
cdef float result
for i in range(a_len):
n += 1
delta = a[i] - mean
mean += delta / n
M2 += delta * (a[i] - mean)
if n < 2:
result = np.nan
return result
else:
result = sqrt(M2 / (n - 1))
return result
Using this to test:
a = np.random.rand(10000).astype(np.float)
print std_welford(a)
%timeit -n 10 -r 10 std_welford(a)
Cython code
0.288327455521
10 loops, best of 10: 59.6 µs per loop
Original code
0.289605617397
10 loops, best of 10: 18.5 ms per loop
Numpy std
0.289493223504
10 loops, best of 10: 29.3 µs per loop
So a speed increase of around 300x. Still not as good as the numpy version..