问题
I have a large dataset in a .npy format of size (500000,18). In order to feed it in a conv2D net using a generator I slipt in in X and y and reshape it in the format (-1, 96, 10, 10, 17) and (-1, 1), respectively. However, when I feed it inside the model I get and memory error:
2020-08-26 14:37:03.691425: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 462080 totalling 451.2KiB
2020-08-26 14:37:03.691432: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 515840 totalling 503.8KiB
2020-08-26 14:37:03.691438: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 22.89GiB
2020-08-26 14:37:03.691445: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 24576286720 memory_limit_: 68719476736 available bytes: 44143190016 curr_region_allocation_bytes_: 34359738368
2020-08-26 14:37:03.691455: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 68719476736
InUse: 24576278528
MaxInUse: 24576278784
NumAllocs: 140334
MaxAllocSize: 268435456
I'm using a GPU of 32 Gb.
I have tried different strategies, without success. First, numpy.memmap:
def Meu_Generador_4(path, batch_size, tempo, janela):
total = sum(1 for line in np.load(path))
number_of_batches = total/batch_size
data = np.memmap(path, dtype='float64', mode='r', shape=(total, 18))
""create a memmap array to store the output""
y_output = np.memmap('output', dtype='float64', shape=(total, 18), mode='r+')
counter=0
while 1:
y_output[counter:batch_size+counter] = data[counter:batch_size+counter]
X, y = input_3D(y_output[counter:batch_size+counter], tempo, janela)
y = y.reshape(-1, 1)
counter += 1
yield X.reshape(-1, 96, 10, 10, 17), y
print('AQUI')
#restart counter to yeild data in the next epoch as well
if counter >= number_of_batches:
counter = 0
Or Dask delayed arrays:
def Meu_Generador_3(path, batch_size, tempo, janela):
samples_per_epoch = sum(1 for line in np.load(path))
number_of_batches = np.floor(samples_per_epoch/batch_size)
data = da.from_array(np.load(path, mmap_mode='r'), chunks = (number_of_batches,18))
data = data.to_delayed()
counter=0
while 1:
chunk = da.from_delayed(data[counter][0], shape=(number_of_batches,18), dtype=data.dtype)
X, y = input_3D(chunk.compute(), tempo, janela)
counter += 1
yield X.reshape(-1, 40, 10, 10, 17), y
print("AQUI")
#restart counter to yeild data in the next epoch as well
if (counter+number_of_batches) >= number_of_batches:
counter = 0
I know I can split the file in many smaller files, but I don't want to do it. Thanks
来源:https://stackoverflow.com/questions/63603076/how-to-feed-a-conv2d-net-with-a-large-npy-file-without-overhelming-the-ram-memor