Scikit-learn balanced subsampling

前端未结

关注

 13  1554

I\'m trying to create N balanced random subsamples of my large unbalanced dataset. Is there a way to do this simply with scikit-learn / pandas or do I have to implement it m

相关标签:

13条回答

心在旅途

2020-12-02 11:20

This type of data splitting is not provided among the built-in data splitting techniques exposed in sklearn.cross_validation.

What seems similar to your needs is sklearn.cross_validation.StratifiedShuffleSplit, which can generate subsamples of any size while retaining the structure of the whole dataset, i.e. meticulously enforcing the same unbalance that is in your main dataset. While this is not what you are looking for, you may be able to use the code therein and change the imposed ratio to 50/50 always.

(This would probably be a very good contribution to scikit-learn if you feel up to it.)

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3