I\'m working with a bunch of large numpy arrays, and as these started to chew up too much memory lately, I wanted to replace them with numpy.memmap
instances. The p
The issue is that the flag OWNDATA is False when you create your array. You can change that by requiring the flag to be True when you create the array:
>>> a = np.require(np.memmap('bla.bin', dtype=int), requirements=['O'])
>>> a.shape
(10,)
>>> a.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
>>> a.resize(20, refcheck=False)
>>> a.shape
(20,)
The only caveat is that it may create the array and make a copy to be sure the requirements are met.
Edit to address saving:
If you want to save the re-sized array to disk, you can save the memmap as a .npy formatted file and open as a numpy.memmap
when you need to re-open it and use as a memmap:
>>> a[9] = 1
>>> np.save('bla.npy',a)
>>> b = np.lib.format.open_memmap('bla.npy', dtype=int, mode='r+')
>>> b
memmap([0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Edit to offer another method:
You may get close to what you're looking for by re-sizing the base mmap (a.base or a._mmap, stored in uint8 format) and "reloading" the memmap:
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> a[3] = 7
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
>>> a.base.resize(20*8)
>>> a.flush()
>>> a = np.memmap('bla.bin', dtype=int)
>>> a
memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
If I'm not mistaken, this achieves essentially what @wwwslinger's second solution does, but without having to manually specify the size of the new memmap in bits:
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
In [2]: a[3] = 7
In [3]: a
Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
In [4]: a.flush()
# this will append to the original file as much as is necessary to satisfy
# the new shape requirement, given the specified dtype
In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,))
In [6]: new_a
Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [7]: a[-1] = 10
In [8]: a
Out[8]: memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10])
In [9]: a.flush()
In [11]: new_a
Out[11]:
memmap([ 0, 0, 0, 7, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0])
This works well when the new array needs to be bigger than the old one, but I don't think this type of approach will allow for the size of the memory-mapped file to be automatically truncated if the new array is smaller.
Manually resizing the base, as in @wwwslinger's answer, seems to allow the file to be truncated, but it doesn't reduce the size of the array.
For example:
# this creates a memory mapped file of 10 * 8 = 80 bytes
In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
In [2]: a[:] = range(1, 11)
In [3]: a.flush()
In [4]: a
Out[4]: memmap([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# now truncate the file to 40 bytes
In [5]: a.base.resize(5*8)
In [6]: a.flush()
# the array still has the same shape, but the truncated part is all zeros
In [7]: a
Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,))
# you still need to create a new np.memmap to change the size of the array
In [9]: b
Out[9]: memmap([1, 2, 3, 4, 5])