How can I sample equally from a dataframe?

前端 未结 1 1153
失恋的感觉
失恋的感觉 2021-02-10 11:47

Suppose I have some observations, each with an indicated class from 1 to n. Each of these classes may not necessarily occur equally in the data set.

相关标签:
1条回答
  • 2021-02-10 12:35

    For more elegance you can do this:

    df.groupby('classes').apply(lambda x: x.sample(sample_size))
    

    Extension:

    You can make the sample_size a function of group size to sample with equal probabilities (or proportionately):

    nrows = len(df)
    total_sample_size = 1e4
    df.groupby('classes').\
        apply(lambda x: x.sample(int((x.count()/nrows)*total_sample_size)))
    

    It won't result in the exact number of rows as total_sample_size but sampling will be more proportional than the naive method.

    0 讨论(0)
提交回复
热议问题