feature-selection

Feature selection using scikit-learn

狂风中的少年 提交于 2020-01-30 15:48:46
问题 I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_transform(A1, A2) Since my dataset consist of negative values, I get the following error: ValueError Traceback (most recent call last) /media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>() ----> 1 2 3 4 5 /usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit

Doing hyperparameter estimation for the estimator in each fold of Recursive Feature Elimination

筅森魡賤 提交于 2020-01-13 10:22:23
问题 I am using sklearn to carry out recursive feature elimination with cross-validation, using the RFECV module. RFE involves repeatedly training an estimator on the full set of features, then removing the least informative features, until converging on the optimal number of features. In order to obtain optimal performance by the estimator, I want to select the best hyperparameters for the estimator for each number of features (edited for clarity). The estimator is a linear SVM so I am only

Doing hyperparameter estimation for the estimator in each fold of Recursive Feature Elimination

荒凉一梦 提交于 2020-01-13 10:21:54
问题 I am using sklearn to carry out recursive feature elimination with cross-validation, using the RFECV module. RFE involves repeatedly training an estimator on the full set of features, then removing the least informative features, until converging on the optimal number of features. In order to obtain optimal performance by the estimator, I want to select the best hyperparameters for the estimator for each number of features (edited for clarity). The estimator is a linear SVM so I am only

Feature selection with caret rfe and training with another method

随声附和 提交于 2020-01-13 06:41:47
问题 Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it. control <- rfeControl(functions=rfFuncs, method="cv", number=5) model <- rfe

Feature selection with caret rfe and training with another method

╄→гoц情女王★ 提交于 2020-01-13 06:41:09
问题 Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it. control <- rfeControl(functions=rfFuncs, method="cv", number=5) model <- rfe

R caret package rfe never finishes error task 1 failed - “replacement has length zero”

和自甴很熟 提交于 2020-01-10 04:14:09
问题 I recently started to look into caret package for a model I'm developing. I'm using the latest version. As the first step, I decided to use it for feature selection. The data I'm using has about 760 features and 10k observations. I created a simple function based on the training material on line. Unfortunately, I consistently get an error and so the process never finishes. Here is the code that produces error. In this example I am using a small subset of features. I started with the full set

how to convert mix of text and numerical data to feature data in apache spark

怎甘沉沦 提交于 2020-01-07 16:34:48
问题 I have a CSV of both textual and numerical data. I need to convert it to feature vector data in Spark (Double values). Is there any way to do that ? I see some e.g where each keyword is mapped to some double value and use this to convert. However if there are multiple keywords, it is difficult to do this way. Is there any other way out? I see Spark provides Extractors which will convert into feature vectors. Could someone please give an example? 48, Private, 105808, 9th, 5, Widowed, Transport

Recursive Feature Elimination on Keras Models

只愿长相守 提交于 2020-01-06 05:26:12
问题 I want to apply Recursive Feature Elimination (RFE) on a model that is built using Keras models and layers. When I run: from sklearn.feature_selection import RFE ## building Keras model ... # rfe = RFE(model, 3) rfe = rfe.fit(X_train, y_train) I get following error: TypeError: Cannot clone object '<keras.models.Sequential object at 0x7f1ab93eb510>' (type <class 'keras.models.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods. I tried

How to interpret and view the complete permutation feature plot in jupyter?

不问归期 提交于 2020-01-04 07:54:29
问题 I am trying to generate the feature importance plot through Permutation Feature Importance plot. I am trying to kind of make sure whether the features returned through different approaches is stable. To select optimal features. Can we get a p-value or something of that sort which can indicate the feature is significant? If I could do it with PFI , i could be more confident but looks like the results are entirely opposite Here is my code to generate the plot logreg=LogisticRegression(random

Genetic algorithms: fitness function for feature selection algorithm

巧了我就是萌 提交于 2020-01-02 02:42:08
问题 I have data set n x m where there are n observations and each observation consists of m values for m attributes. Each observation has also observed result assigned to it. m is big, too big for my task. I am trying to find a best and smallest subset of m attributes that still represents the whole dataset quite well, so that I could use only these attributes for teaching a neural network. I want to use genetic algorithm for this. The problem is the fittness function. It should tell how well the