numpy-memmap

numpy memmap memory usage - want to iterate once

允我心安 提交于 2021-02-16 13:17:10
问题 let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162)) now let say I want to iterate over this matrix (not essentially in an ordered fashion) such that each row will be accessed exactly once. p = some_permutation_of_0_to_2999999() I would like to do something like that: start = 0 end = 3000000 num_rows_to_load_at_once = some_size_that_will_fit_in_memory()

numpy memmap memory usage - want to iterate once

为君一笑 提交于 2021-02-16 13:16:04
问题 let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162)) now let say I want to iterate over this matrix (not essentially in an ordered fashion) such that each row will be accessed exactly once. p = some_permutation_of_0_to_2999999() I would like to do something like that: start = 0 end = 3000000 num_rows_to_load_at_once = some_size_that_will_fit_in_memory()

How to feed a conv2d net with a large npy file without overhelming the RAM memory?

丶灬走出姿态 提交于 2021-01-29 07:36:25
问题 I have a large dataset in a .npy format of size (500000,18). In order to feed it in a conv2D net using a generator I slipt in in X and y and reshape it in the format (-1, 96, 10, 10, 17) and (-1, 1), respectively. However, when I feed it inside the model I get and memory error: 2020-08-26 14:37:03.691425: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 462080 totalling 451.2KiB 2020-08-26 14:37:03.691432: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks

Shuffling and importing few rows of a saved numpy file

别等时光非礼了梦想. 提交于 2020-06-29 04:11:09
问题 I have 2 saved .npy files: X_train - (18873, 224, 224, 3) - 21.2GB Y_train - (18873,) - 148KB X_train is cats and dogs images (cats being in 1st half and dogs in 2nd half, unshuffled) and is mapped with Y_train as 0 and 1. Thus Y_train is [1,1,1,1,1,1,.........,0,0,0,0,0,0]. I want to import randomly say, 256 images (both cats and dogs images in nearly 50-50%) in X and its mapping in Y. Since the data is large, I cannot import X_train in my RAM. Thus I have tried (1st approach): import numpy

Can memmap pandas series. What about a dataframe?

那年仲夏 提交于 2019-12-17 15:58:30
问题 It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series. def assert_readonly(iloc): try: iloc[0] = 999 # Should be non-editable raise Exception("MUST BE READ ONLY (1)") except ValueError as e: assert "read-only" in e.message # Original ndarray n = 1000 _arr = np.arange(0,1000, dtype=float) # Convert it to a memmap mm = np.memmap(filename, mode='w+', shape=_arr.shape, dtype=_arr.dtype) mm[:] = _arr[:] del _arr mm

packing boolean array needs go throught int (numpy 1.8.2)

痴心易碎 提交于 2019-12-10 16:18:36
问题 I'm looking for the more compact way to store boolean. numpy internally need 8bits to store one boolean, but np.packbits allow to pack them, that's pretty cool. The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need to first spend 256e6 bytes to convert the boolean array in int array ! In [1]: db_bool = np.array(np.random.randint(2, size=(int(2e6), 16)), dtype=bool) In [2]: db_int = np.asarray(db_bool, dtype=int) In [3]: db_packed = np.packbits(db_int, axis=0)

Can memmap pandas series. What about a dataframe?

自作多情 提交于 2019-11-27 20:53:31
It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series. def assert_readonly(iloc): try: iloc[0] = 999 # Should be non-editable raise Exception("MUST BE READ ONLY (1)") except ValueError as e: assert "read-only" in e.message # Original ndarray n = 1000 _arr = np.arange(0,1000, dtype=float) # Convert it to a memmap mm = np.memmap(filename, mode='w+', shape=_arr.shape, dtype=_arr.dtype) mm[:] = _arr[:] del _arr mm.flush() mm.flags['WRITEABLE'] = False # Make immutable! # Wrap as a series s = pd.Series(mm, name="a")