How to feed a conv2d net with a large npy file without overhelming the RAM memory?

问题

I have a large dataset in a .npy format of size (500000,18). In order to feed it in a conv2D net using a generator I slipt in in X and y and reshape it in the format (-1, 96, 10, 10, 17) and (-1, 1), respectively. However, when I feed it inside the model I get and memory error:

2020-08-26 14:37:03.691425: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 462080 totalling 451.2KiB
2020-08-26 14:37:03.691432: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 515840 totalling 503.8KiB
2020-08-26 14:37:03.691438: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 22.89GiB
2020-08-26 14:37:03.691445: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 24576286720 memory_limit_: 68719476736 available bytes: 44143190016 curr_region_allocation_bytes_: 34359738368
2020-08-26 14:37:03.691455: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit:                 68719476736
InUse:                 24576278528
MaxInUse:              24576278784
NumAllocs:                  140334
MaxAllocSize:            268435456

I'm using a GPU of 32 Gb.

I have tried different strategies, without success. First, numpy.memmap:

def Meu_Generador_4(path, batch_size, tempo, janela):
  total  = sum(1 for line in np.load(path))
  number_of_batches = total/batch_size
  data = np.memmap(path, dtype='float64', mode='r', shape=(total, 18))
  
  ""create a memmap array to store the output""
  y_output = np.memmap('output', dtype='float64', shape=(total, 18), mode='r+')
  counter=0

  while 1:
    y_output[counter:batch_size+counter] = data[counter:batch_size+counter]
    X, y = input_3D(y_output[counter:batch_size+counter], tempo, janela) 
    y = y.reshape(-1, 1)
    counter += 1
    yield X.reshape(-1, 96, 10, 10, 17), y
    print('AQUI')
        #restart counter to yeild data in the next epoch as well
    if counter >= number_of_batches:
        counter = 0

Or Dask delayed arrays:

def Meu_Generador_3(path, batch_size, tempo, janela):
  samples_per_epoch  = sum(1 for line in np.load(path))
  number_of_batches = np.floor(samples_per_epoch/batch_size)
  data = da.from_array(np.load(path, mmap_mode='r'), chunks = (number_of_batches,18))
  data = data.to_delayed()
  counter=0

  while 1:
    chunk = da.from_delayed(data[counter][0], shape=(number_of_batches,18), dtype=data.dtype)
    X, y = input_3D(chunk.compute(), tempo, janela) 
    counter += 1
    yield X.reshape(-1, 40, 10, 10, 17), y
    print("AQUI")
    #restart counter to yeild data in the next epoch as well
    if (counter+number_of_batches) >= number_of_batches:
        counter = 0

I know I can split the file in many smaller files, but I don't want to do it. Thanks

来源：https://stackoverflow.com/questions/63603076/how-to-feed-a-conv2d-net-with-a-large-npy-file-without-overhelming-the-ram-memor

标签

python

memory-management

conv-neural-network

dask-delayed

numpy-memmap