load np.memmap without knowing shape

情到浓时终转凉″ 提交于 2019-12-10 17:14:14

问题


Is it possible to load a numpy.memmap without knowing the shape and still recover the shape of the data?

data = np.arange(12, dtype='float32')
data.resize((3,4))
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
del fp
newfp = np.memmap(filename, dtype='float32', mode='r', shape=(3,4))

In the last line, I want to be able not to specify the shape and still get the variable newfp to have the shape (3,4), just like it would happen with joblib.load. Is this possible? Thanks.


回答1:


Not unless that information has been explicitly stored in the file somewhere. As far as np.memmap is concerned, the file is just a flat buffer.

I would recommend using np.save to persist numpy arrays, since this also preserves the metadata specifying their dimensions, dtypes etc. You can also load an .npy file as a memmap by passing the memmap_mode= parameter to np.load.

joblib.dump uses a combination of pickling to store generic Python objects and np.save to store numpy arrays.


To initialize an empty memory-mapped array backed by a .npy file you can use numpy.lib.format.open_memmap:

import numpy as np
from numpy.lib.format import open_memmap

# initialize an empty 10TB memory-mapped array
x = open_memmap('/tmp/bigarray.npy', mode='w+', dtype=np.ubyte, shape=(10**13,))

You might be surprised by the fact that this succeeds even if the array is larger than the total available disk space (my laptop only has a 500GB SSD, but I just created a 10TB memmap). This is possible because the file that's created is sparse.

Credit for discovering open_memmap should go to kiyo's previous answer here.




回答2:


The answer from @ali_m is perfectly valid. I would like to mention my personal preference, in case it helps anyone. I always begin my memmap arrays with the shape as the first 2 elements. Doing this is as simple as:

# Writing the memmap array
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
fp = np.memmap(filename, dtype='float32', mode='r+', shape=(14,))
fp[2:] = fp[:-2]
fp[:2] = [3, 4]
del fp

Or simpler still:

# Writing the memmap array
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(14,))
fp[2:] = data[:]
fp[:2] = [3, 4]
del fp

Then you can easily read the array as:

#reading the memmap array
newfp = np.memmap(filename, dtype='float32', mode='r')
row_size, col_size = newfp[0:2]
newfp = newfp[2:].reshape((row_size, col_size))



回答3:


An alternative to numpy.memmap is tifffile.memmap:

from tifffile import memmap
newArray = memmap("name", shape=(3,3), dtype='uint8')
newArray[1,1] = 11
del(newArray)

newArray file is created having values:

0  0  0
0  11 0
0  0  0  

Now lets read it back:

array = memmap("name", dtype='uint8')
print(array.shape) # prints (3,3)
print(array)

prints:

0  0  0
0  11 0
0  0  0


来源:https://stackoverflow.com/questions/36749082/load-np-memmap-without-knowing-shape

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!