Memory Error at Python while converting to array

匿名 (未验证) 提交于 2019-12-03 03:04:01

问题:

My code is shown below:

from sklearn.datasets import load_svmlight_files import numpy as np  perm1 =np.random.permutation(25000) perm2 = np.random.permutation(25000)  X_tr, y_tr, X_te, y_te = load_svmlight_files(("dir/file.feat", "dir/file.feat"))  #randomly shuffle data X_train = X_tr[perm1,:].toarray()[:,0:2000] y_train = y_tr[perm1]>5 #turn into binary problem 

The code works fine until here, but when I try to convert one more object to an array, my program returns a memory error.

Code:

X_test = X_te[perm2,:].toarray()[:,0:2000] 

Error:

--------------------------------------------------------------------------- MemoryError                               Traceback (most recent call last) <ipython-input-7-31f5e4f6b00c> in <module>() ----> 1 X_test = X_test.toarray()  C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\compressed.pyc in toarray(self, order, out)     788     def toarray(self, order=None, out=None):     789         """See the docstring for `spmatrix.toarray`.""" --> 790         return self.tocoo(copy=False).toarray(order=order, out=out)     791      792     ##############################################################  C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\coo.pyc in toarray(self, order, out)     237     def toarray(self, order=None, out=None):     238         """See the docstring for `spmatrix.toarray`.""" --> 239         B = self._process_toarray_args(order, out)     240         fortran = int(B.flags.f_contiguous)     241         if not fortran and not B.flags.c_contiguous:  C:\Users\Asq\AppData\Local\Enthought\Canopy\User\lib\site-packages\scipy\sparse\base.pyc in _process_toarray_args(self, order, out)     697             return out     698         else: --> 699             return np.zeros(self.shape, dtype=self.dtype, order=order)     700      701   MemoryError:  

I'm new in python, and I dont know whether one needs to manually fix the memory error.

Other parts of my code return the same errors (like training with knn or ann).

How can I fix this?

回答1:

In cases like these, it's often possible to avoid converting your sparse matrices to dense format.

For example, you can do the permutation and slice easily with CSR or CSC sparse formats.

You haven't posted the code that follows, but I suspect that can be made to handle sparse inputs as well. If that's true, your memory issues will no longer be a problem.



回答2:

Use numpy.asarray() in-place conversion instead of toarray() which requires new memory.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!