numpy-memmap | 易学教程

numpy memmap memory usage - want to iterate once

阅读更多关于 numpy memmap memory usage - want to iterate once

问题 let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162)) now let say I want to iterate over this matrix (not essentially in an ordered fashion) such that each row will be accessed exactly once. p = some_permutation_of_0_to_2999999() I would like to do something like that: start = 0 end = 3000000 num_rows_to_load_at_once = some_size_that_will_fit_in_memory()

numpy memmap memory usage - want to iterate once

阅读更多关于 numpy memmap memory usage - want to iterate once

How to feed a conv2d net with a large npy file without overhelming the RAM memory?

阅读更多关于 How to feed a conv2d net with a large npy file without overhelming the RAM memory?

问题 I have a large dataset in a .npy format of size (500000,18). In order to feed it in a conv2D net using a generator I slipt in in X and y and reshape it in the format (-1, 96, 10, 10, 17) and (-1, 1), respectively. However, when I feed it inside the model I get and memory error: 2020-08-26 14:37:03.691425: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 462080 totalling 451.2KiB 2020-08-26 14:37:03.691432: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks

How to use numpy memmap inside keras generator to not exceed RAM memory?

阅读更多关于 How to use numpy memmap inside keras generator to not exceed RAM memory?

来源： https://stackoverflow.com/questions/63584973/how-to-use-numpy-memmap-inside-keras-generator-to-not-exceed-ram-memory

Shuffling and importing few rows of a saved numpy file

阅读更多关于 Shuffling and importing few rows of a saved numpy file

问题 I have 2 saved .npy files: X_train - (18873, 224, 224, 3) - 21.2GB Y_train - (18873,) - 148KB X_train is cats and dogs images (cats being in 1st half and dogs in 2nd half, unshuffled) and is mapped with Y_train as 0 and 1. Thus Y_train is [1,1,1,1,1,1,.........,0,0,0,0,0,0]. I want to import randomly say, 256 images (both cats and dogs images in nearly 50-50%) in X and its mapping in Y. Since the data is large, I cannot import X_train in my RAM. Thus I have tried (1st approach): import numpy

Can memmap pandas series. What about a dataframe?

阅读更多关于 Can memmap pandas series. What about a dataframe?

问题 It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series. def assert_readonly(iloc): try: iloc[0] = 999 # Should be non-editable raise Exception("MUST BE READ ONLY (1)") except ValueError as e: assert "read-only" in e.message # Original ndarray n = 1000 _arr = np.arange(0,1000, dtype=float) # Convert it to a memmap mm = np.memmap(filename, mode='w+', shape=_arr.shape, dtype=_arr.dtype) mm[:] = _arr[:] del _arr mm

packing boolean array needs go throught int (numpy 1.8.2)

阅读更多关于 packing boolean array needs go throught int (numpy 1.8.2)

问题 I'm looking for the more compact way to store boolean. numpy internally need 8bits to store one boolean, but np.packbits allow to pack them, that's pretty cool. The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need to first spend 256e6 bytes to convert the boolean array in int array ! In [1]: db_bool = np.array(np.random.randint(2, size=(int(2e6), 16)), dtype=bool) In [2]: db_int = np.asarray(db_bool, dtype=int) In [3]: db_packed = np.packbits(db_int, axis=0)

Can memmap pandas series. What about a dataframe?

阅读更多关于 Can memmap pandas series. What about a dataframe?

It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series. def assert_readonly(iloc): try: iloc[0] = 999 # Should be non-editable raise Exception("MUST BE READ ONLY (1)") except ValueError as e: assert "read-only" in e.message # Original ndarray n = 1000 _arr = np.arange(0,1000, dtype=float) # Convert it to a memmap mm = np.memmap(filename, mode='w+', shape=_arr.shape, dtype=_arr.dtype) mm[:] = _arr[:] del _arr mm.flush() mm.flags['WRITEABLE'] = False # Make immutable! # Wrap as a series s = pd.Series(mm, name="a")