I'm new to scikit-learn. I'm trying use preprocessing. OneHotEncoder to encode my training and test data. After encoding I tried to train Random forest classifier using that data. But I get the following error when fitting. (Here the error trace)
99 model.fit(X_train, y_train) 100 preds = model.predict_proba(X_cv)[:, 1] 101 C:\Python27\lib\site-packages\sklearn\ensemble\forest.pyc in fit(self, X, y, sample_weight) 288 289 # Precompute some data --> 290 X, y = check_arrays(X, y, sparse_format="dense") 291 if (getattr(X, "dtype", None) != DTYPE or 292 X.ndim != 2 or C:\Python27\lib\site-packages\sklearn\utils\validation.pyc in check_arrays(*arrays, **options) 200 array = array.tocsc() 201 elif sparse_format == 'dense': --> 202 raise TypeError('A sparse matrix was passed, but dense ' 203 'data is required. Use X.toarray() to ' 204 'convert to a dense numpy array.') TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
I tried to convert the sparse matrix into dense using X.toarray() and X.todense() But when I do that, I get the following error trace.
99 model.fit(X_train.toarray(), y_train) 100 preds = model.predict_proba(X_cv)[:, 1] 101 C:\Python27\lib\site-packages\scipy\sparse\compressed.pyc in toarray(self) 548 549 def toarray(self): --> 550 return self.tocoo(copy=False).toarray() 551 552 ############################################################## C:\Python27\lib\site-packages\scipy\sparse\coo.pyc in toarray(self) 236 237 def toarray(self): --> 238 B = np.zeros(self.shape, dtype=self.dtype) 239 M,N = self.shape 240 coo_todense(M, N, self.nnz, self.row, self.col, self.data, B.ravel()) ValueError: array is too big.
Can anyone help me to fix this.
Thank you