Querying a NumPy array of NumPy arrays saved as an npz is slow

前端 未结 1 971
余生分开走
余生分开走 2021-01-21 12:20

I generate a npz file as follows:

import numpy as np
import os

# Generate npz file
dataset_text_filepath = \'test_np_load.npz\'
texts = []
for text_number in ra         


        
相关标签:
1条回答
  • 2021-01-21 12:37

    dataset['texts'] reads the file each time it is used. load of a npz just returns a file loader, not the actual data. It's a 'lazy loader', loading the particular array only when accessed. The load docs could be clearer, but they say:

    - If the file is a ``.npz`` file, the returned value supports the context
      manager protocol in a similar fashion to the open function::
    
        with load('foo.npz') as data:
            a = data['a']
    
      The underlying file descriptor is closed when exiting the 'with' block.
    

    and from the savez:

     When opening the saved ``.npz`` file with `load` a `NpzFile` object is
    returned. This is a dictionary-like object which can be queried for
    its list of arrays (with the ``.files`` attribute), and for the arrays
    themselves.
    

    More details in help(np.lib.npyio.NpzFile)

    0 讨论(0)
提交回复
热议问题