I\'m working with a bunch of large numpy arrays, and as these started to chew up too much memory lately, I wanted to replace them with numpy.memmap
instances. The p
If I'm not mistaken, this achieves essentially what @wwwslinger's second solution does, but without having to manually specify the size of the new memmap in bits:
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
In [2]: a[3] = 7
In [3]: a
Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
In [4]: a.flush()
# this will append to the original file as much as is necessary to satisfy
# the new shape requirement, given the specified dtype
In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,))
In [6]: new_a
Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [7]: a[-1] = 10
In [8]: a
Out[8]: memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10])
In [9]: a.flush()
In [11]: new_a
Out[11]:
memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0])
This works well when the new array needs to be bigger than the old one, but I don't think this type of approach will allow for the size of the memory-mapped file to be automatically truncated if the new array is smaller.
Manually resizing the base, as in @wwwslinger's answer, seems to allow the file to be truncated, but it doesn't reduce the size of the array.
For example:
# this creates a memory mapped file of 10 * 8 = 80 bytes
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
In [2]: a[:] = range(1, 11)
In [3]: a.flush()
In [4]: a
Out[4]: memmap([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# now truncate the file to 40 bytes
In [5]: a.base.resize(5*8)
In [6]: a.flush()
# the array still has the same shape, but the truncated part is all zeros
In [7]: a
Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,))
# you still need to create a new np.memmap to change the size of the array
In [9]: b
Out[9]: memmap([1, 2, 3, 4, 5])