问题
I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method:
SelectKBest(chi2, k=10).fit_transform(A1, A2)
Since my dataset consist of negative values, I get the following error:
ValueError Traceback (most recent call last)
/media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>()
----> 1
2
3
4
5
/usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit_params)
427 else:
428 # fit method of arity 2 (supervised transformation)
--> 429 return self.fit(X, y, **fit_params).transform(X)
430
431
/usr/local/lib/python2.6/dist-packages/sklearn/feature_selection/univariate_selection.pyc in fit(self, X, y)
300 self._check_params(X, y)
301
--> 302 self.scores_, self.pvalues_ = self.score_func(X, y)
303 self.scores_ = np.asarray(self.scores_)
304 self.pvalues_ = np.asarray(self.pvalues_)
/usr/local/lib/python2.6/dist- packages/sklearn/feature_selection/univariate_selection.pyc in chi2(X, y)
190 X = atleast2d_or_csr(X)
191 if np.any((X.data if issparse(X) else X) < 0):
--> 192 raise ValueError("Input X must be non-negative.")
193
194 Y = LabelBinarizer().fit_transform(y)
ValueError: Input X must be non-negative.
Can someone tell me how can I transform my data ?
回答1:
The error message Input X must be non-negative
says it all: Pearson's chi square test (goodness of fit) does not apply to negative values. It's logical because the chi square test assumes frequencies distribution and a frequency can't be a negative number. Consequently, sklearn.feature_selection.chi2 asserts the input is non-negative.
You are saying that your features are "min, max, mean, median and FFT of accelerometer signal". In many cases, it may be quite safe to simply shift each feature to make it all positive, or even normalize to [0, 1]
interval as suggested by EdChum.
If data transformation is for some reason not possible (e.g. a negative value is an important factor), you should pick another statistic to score your features:
- sklearn.feature_selection.f_classif computes ANOVA f-value
- sklearn.feature_selection.mutual_info_classif computes the mutual information
Since the whole point of this procedure is to prepare the features for another method, it's not a big deal to pick anyone, the end result usually the same or very close.
来源:https://stackoverflow.com/questions/25792012/feature-selection-using-scikit-learn