How can I sample equally from a dataframe?

前端未结

关注

 1  1158

失恋的感觉 2021-02-10 11:47

Suppose I have some observations, each with an indicated class from 1 to n. Each of these classes may not necessarily occur equally in the data set.

1条回答

傲寒 (楼主)

2021-02-10 12:35
For more elegance you can do this:
```
df.groupby('classes').apply(lambda x: x.sample(sample_size))
```
Extension:

You can make the sample_size a function of group size to sample with equal probabilities (or proportionately):
```
nrows = len(df)
total_sample_size = 1e4
df.groupby('classes').\
    apply(lambda x: x.sample(int((x.count()/nrows)*total_sample_size)))
```
It won't result in the exact number of rows as total_sample_size but sampling will be more proportional than the naive method.
0 讨论(0)
发布评论:

提交评论
- 加载中...

How can I sample equally from a dataframe?

Extension: