How to perform undersampling (the right way) with python scikit-learn?
问题 I am attempting to perform undersampling of the majority class using python scikit learn. Currently my codes look for the N of the minority class and then try to undersample the exact same N from the majority class. And both the test and training data have this 1:1 distribution as a result. But what I really want is to do this 1:1 distribution on the training data ONLY but test it on the original distribution in the testing data. I am not quite sure how to do the latter as there is some dict