feature-selection | 易学教程

Feature selection using scikit-learn

阅读更多关于 Feature selection using scikit-learn

问题 I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_transform(A1, A2) Since my dataset consist of negative values, I get the following error: ValueError Traceback (most recent call last) /media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>() ----> 1 2 3 4 5 /usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit

Doing hyperparameter estimation for the estimator in each fold of Recursive Feature Elimination

阅读更多关于 Doing hyperparameter estimation for the estimator in each fold of Recursive Feature Elimination

问题 I am using sklearn to carry out recursive feature elimination with cross-validation, using the RFECV module. RFE involves repeatedly training an estimator on the full set of features, then removing the least informative features, until converging on the optimal number of features. In order to obtain optimal performance by the estimator, I want to select the best hyperparameters for the estimator for each number of features (edited for clarity). The estimator is a linear SVM so I am only

Doing hyperparameter estimation for the estimator in each fold of Recursive Feature Elimination

阅读更多关于 Doing hyperparameter estimation for the estimator in each fold of Recursive Feature Elimination

Feature selection with caret rfe and training with another method

阅读更多关于 Feature selection with caret rfe and training with another method

问题 Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it. control <- rfeControl(functions=rfFuncs, method="cv", number=5) model <- rfe

Feature selection with caret rfe and training with another method

阅读更多关于 Feature selection with caret rfe and training with another method

R caret package rfe never finishes error task 1 failed - “replacement has length zero”

阅读更多关于 R caret package rfe never finishes error task 1 failed - “replacement has length zero”

问题 I recently started to look into caret package for a model I'm developing. I'm using the latest version. As the first step, I decided to use it for feature selection. The data I'm using has about 760 features and 10k observations. I created a simple function based on the training material on line. Unfortunately, I consistently get an error and so the process never finishes. Here is the code that produces error. In this example I am using a small subset of features. I started with the full set

how to convert mix of text and numerical data to feature data in apache spark

阅读更多关于 how to convert mix of text and numerical data to feature data in apache spark

问题 I have a CSV of both textual and numerical data. I need to convert it to feature vector data in Spark (Double values). Is there any way to do that ? I see some e.g where each keyword is mapped to some double value and use this to convert. However if there are multiple keywords, it is difficult to do this way. Is there any other way out? I see Spark provides Extractors which will convert into feature vectors. Could someone please give an example? 48, Private, 105808, 9th, 5, Widowed, Transport

Recursive Feature Elimination on Keras Models

阅读更多关于 Recursive Feature Elimination on Keras Models

问题 I want to apply Recursive Feature Elimination (RFE) on a model that is built using Keras models and layers. When I run: from sklearn.feature_selection import RFE ## building Keras model ... # rfe = RFE(model, 3) rfe = rfe.fit(X_train, y_train) I get following error: TypeError: Cannot clone object '<keras.models.Sequential object at 0x7f1ab93eb510>' (type <class 'keras.models.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods. I tried

How to interpret and view the complete permutation feature plot in jupyter?

阅读更多关于 How to interpret and view the complete permutation feature plot in jupyter?

问题 I am trying to generate the feature importance plot through Permutation Feature Importance plot. I am trying to kind of make sure whether the features returned through different approaches is stable. To select optimal features. Can we get a p-value or something of that sort which can indicate the feature is significant? If I could do it with PFI , i could be more confident but looks like the results are entirely opposite Here is my code to generate the plot logreg=LogisticRegression(random

Genetic algorithms: fitness function for feature selection algorithm

阅读更多关于 Genetic algorithms: fitness function for feature selection algorithm

问题 I have data set n x m where there are n observations and each observation consists of m values for m attributes. Each observation has also observed result assigned to it. m is big, too big for my task. I am trying to find a best and smallest subset of m attributes that still represents the whole dataset quite well, so that I could use only these attributes for teaching a neural network. I want to use genetic algorithm for this. The problem is the fittness function. It should tell how well the