Shuffling and importing few rows of a saved numpy file

别等时光非礼了梦想. 提交于 2020-06-29 04:11:09

问题


I have 2 saved .npy files:

X_train - (18873, 224, 224, 3) - 21.2GB
Y_train - (18873,) - 148KB

X_train is cats and dogs images (cats being in 1st half and dogs in 2nd half, unshuffled) and is mapped with Y_train as 0 and 1. Thus Y_train is [1,1,1,1,1,1,.........,0,0,0,0,0,0].

I want to import randomly say, 256 images (both cats and dogs images in nearly 50-50%) in X and its mapping in Y. Since the data is large, I cannot import X_train in my RAM.

Thus I have tried (1st approach):

import numpy as np
np.random.seed(666555)
X_train = np.load('Processed/X_train.npy', mmap_mode='r')
X = np.random.shuffle(X_train)
X = X[:256, :, :, :]
Y_train = np.load('Processed/Y_train.npy', mmap_mode='r')
Y = np.random.shuffle(Y_train)
Y = Y[:256]

This gives the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-68-8b2a13921b8d> in <module>
      2 np.random.seed(666555)
      3 X_train = np.load('Processed/X_train.npy', mmap_mode='r')
----> 4 X = np.random.shuffle(X_train)
      5 X = X[:256, :, :, :]
      6 Y_train = np.load('Processed/Y_train.npy', mmap_mode='r')

mtrand.pyx in numpy.random.mtrand.RandomState.shuffle()

mtrand.pyx in numpy.random.mtrand.RandomState.shuffle()

ValueError: assignment destination is read-only

I have also tried (2nd approach):

import numpy as np
np.random.seed(666555)
X = np.memmap('Processed/X_train.npy', 'float64', shape = (256, 224, 224, 3), mode = 'c')
Y = np.memmap('Processed/Y_train.npy', 'float64', shape = (256), mode = 'c')
X = np.random.shuffle(X)
Y = np.random.shuffle(Y)
print(X)
print(Y)

This outputs:

None
None

In 2nd approach, I will get only cats images as np.memmap will collect only 1st 256 images. Then shuffling will be of no use.

Please tell me how to do this with any method.


回答1:


your shuffelling procedure is not correct. following this strategy you are also shuffling your X in a different way from Y (there is no more match between X and Y after shuffle). here a demonstrative example:

np.random.seed(666555)
xxx = np.asarray([1,2,3,4,5,6,7,8,9])
yyy = np.asarray([1,2,3,4,5,6,7,8,9])
np.random.shuffle(xxx)
np.random.shuffle(yyy)

print((yyy == xxx).all()) # False

here the correct procedure:

np.random.seed(666555)
xxx = np.asarray([1,2,3,4,5,6,7,8,9])
yyy = np.asarray([1,2,3,4,5,6,7,8,9])
idx = np.arange(0,len(xxx))
np.random.shuffle(idx)

print((yyy[idx] == xxx[idx]).all()) # True

in this way you also override the None problem



来源:https://stackoverflow.com/questions/62574820/shuffling-and-importing-few-rows-of-a-saved-numpy-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!