How to set intercept_scaling in scikit-learn LogisticRegression

十年热恋 提交于 2019-12-11 01:19:33

问题


I am using scikit-learn's LogisticRegression object for regularized binary classification. I've read the documentation on intercept_scaling but I don't understand how to choose this value intelligently.

The datasets look like this:

  • 10-20 features, 300-500 replicates
  • Highly non-Gaussian, in fact most observations are zeros
  • The output classes are not necessarily equally likely. In some cases they are almost 50/50, in other cases they are more like 90/10.
  • Typically C=0.001 gives good cross-validated results.

The documentation contains warnings that the intercept itself is subject to regularization, like every other feature, and that intercept_scaling can be used to address this. But how should I choose this value? One simple answer is to explore many possible combinations of C and intercept_scaling and choose the parameters that give the best performance. But this parameter search will take quite a while and I'd like to avoid that if possible.

Ideally, I would like to use the intercept to control the distribution of output predictions. That is, I would like to ensure that the probability that the classifier predicts "class 1" on the training set is equal to the proportion of "class 1" data in the training set. I know that this is the case under certain circumstances, but this is not the case in my data. I don't know if it's due to the regularization or to the non-Gaussian nature of the input data.

Thanks for any suggestions!


回答1:


While you tried oversampling the positive class by setting class_weight="auto"? That effectively oversamples the underrepresented classes and undersamples the majority class.

(The current stable docs are a bit confusing since they seem to have been copy-pasted from SVC and not edited for LR; that's just changed in the bleeding edge version.)



来源:https://stackoverflow.com/questions/17711304/how-to-set-intercept-scaling-in-scikit-learn-logisticregression

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!