Feature importance in logistic regression with bagging classifier

问题

I am working on a binary classification problem which I am using the logistic regression within bagging classifer.

Few lines of code are as follows:-

    model = BaggingClassifier(LogisticRegression(), 
                  n_estimators=10, 
                  bootstrap = True, random_state = 1)
    model.fit(X,y,sample_weights)

I am intrested in knowing feature importance metric for this model. How can this be done if estimator for bagging classifer is logistic regression?

I am able to get the feature importance when decision tree is used as an estimator for bagging classifer. The code for this is as follows:-

    feature_importances = np.mean([tree.feature_importances_ for tree in  model.estimators_], axis=0)

回答1:

You can't infer the feature importance of the linear classifiers directly. On the other hand, what you can do is see the magnitude of its coefficient. You can do that by:

# Get an average of the model coefficients
model_coeff = np.mean([lr.coef_ for lr in model.estimators_], axis=0)
# Multiply the model coefficients by the standard deviation of the data
coeff_magnitude = np.std(X, 0) * model_coeff

This will tell you roughly how important each coefficient is. In other words, a value >> 0 indicates tendency of that coefficient to focus on capturing the positive class and a value << 0 indicates that that coefficient is focusing on the positive class.

Here is a sample code based on the values you have provided in the comments:

X_train = np.random.rand(2000, 3)
X_train.shape
# (2000, 3)
model_coeff = [[2.233232, 1.22435, 1.433434]]
coeff_magnitude = np.std(X_train, 0) * model_coeff
coeff_magnitude.shape
# (1, 3)

来源：https://stackoverflow.com/questions/54519113/feature-importance-in-logistic-regression-with-bagging-classifier

标签

python

scikit-learn

ensemble-learning