How to find the importance of the features for a logistic regression model?

后端未结

关注

 1  1787

I have a binary prediction model trained by logistic regression algorithm. I want know which features(predictors) are more important for the decision of positive or negative

相关标签:

1条回答

梦如初夏

2020-12-07 09:53
One of the simplest options to get a feeling for the "influence" of a given parameter in a linear classification model (logistic being one of those), is to consider the magnitude of its coefficient times the standard deviation of the corresponding parameter in the data.

Consider this example:
```
import numpy as np    
from sklearn.linear_model import LogisticRegression

x1 = np.random.randn(100)
x2 = 4*np.random.randn(100)
x3 = 0.5*np.random.randn(100)
y = (3 + x1 + x2 + x3 + 0.2*np.random.randn()) > 0
X = np.column_stack([x1, x2, x3])

m = LogisticRegression()
m.fit(X, y)

# The estimated coefficients will all be around 1:
print(m.coef_)

# Those values, however, will show that the second parameter
# is more influential
print(np.std(X, 0)*m.coef_)
```
An alternative way to get a similar result is to examine the coefficients of the model fit on standardized parameters:
```
m.fit(X / np.std(X, 0), y)
print(m.coef_)
```
Note that this is the most basic approach and a number of other techniques for finding feature importance or parameter influence exist (using p-values, bootstrap scores, various "discriminative indices", etc).

I am pretty sure you would get more interesting answers at https://stats.stackexchange.com/.
0 讨论(0)
发布评论:

提交评论
- 加载中...