Why is this numpy array too big to load?

后端 未结 1 1941
野的像风
野的像风 2021-01-11 13:15

I have a 3.374Gb npz file, myfile.npz.

I can read it in and view the filenames:

a = np.load(\'myfile.npz\')
a.files

gi

相关标签:
1条回答
  • 2021-01-11 14:05

    An np.complex128 array with dimensions (200, 1440, 3, 13, 32) ought to take up about 5.35GiB uncompressed, so if you really did have 8.3GB of free, addressable memory then in principle you ought to be able to load the array.

    However, based on your responses in the comments below, you are using 32 bit versions of Python and numpy. In Windows, a 32 bit process can only address up to 2GB of memory (or 4GB if the binary was compiled with the IMAGE_FILE_LARGE_ADDRESS_AWARE flag; most 32 bit Python distributions are not). Consequently, your Python process is limited to 2GB of address space regardless of how much physical memory you have.

    You can either install 64 bit versions of Python, numpy, and any other Python libraries you need, or live with the 2GB limit and try to work around it. In the latter case you might get away with storing arrays that exceed the 2GB limit mainly on disk (e.g. using np.memmap), but I'd advise you to go for option #1, since operations on memmaped arrays are a lot slower in most cases than for normal np.arrays that reside wholly in RAM.


    If you already have another machine that has enough RAM to load the whole array into core memory then I would suggest you save the array in a different format (either as a plain np.memmap binary, or perhaps better, in an HDF5 file using PyTables or H5py). It's also possible (although slightly trickier) to extract the problem array from the .npz file without loading it into RAM, so that you can then open it as an np.memmap array residing on disk:

    import numpy as np
    
    # some random sparse (compressible) data
    x = np.random.RandomState(0).binomial(1, 0.25, (1000, 1000))
    
    # save it as a compressed .npz file
    np.savez_compressed('x_compressed.npz', x=x)
    
    # now load it as a numpy.lib.npyio.NpzFile object
    obj = np.load('x_compressed.npz')
    
    # contains a list of the stored arrays in the format '<name>.npy'
    namelist = obj.zip.namelist()
    
    # extract 'x.npy' into the current directory
    obj.zip.extract(namelist[0])
    
    # now we can open the array as a memmap
    x_memmap = np.load(namelist[0], mmap_mode='r+')
    
    # check that x and x_memmap are identical
    assert np.all(x == x_memmap[:])
    
    0 讨论(0)
提交回复
热议问题