randomly sample rows of a dataframe until the desired sum of a column is reached

前端 未结 1 769
长情又很酷
长情又很酷 2021-01-27 05:21

I have a dataframe like this:

ID  key   acres
1   156   10
2   157   60
3   158   50
4   159   1
5   160   9
6   161   110

and I want to random

相关标签:
1条回答
  • Let's try something like this:

    • sample pulls a sample row from the dataframe, argument frac=1 states get 100% of
      the rows. This basically shuffles the dataframe.

    • Use iterrrows to iterate through the shuffled dataframe.

    Code:

    acres = 0
    obid = []
    for i in df.sample(frac=1).iterrows():
        if (acres + i[1]['acres']) <= 150:
            acres += i[1]['acres']
            obid.append(i[1]['ID'])
    
    print(obid) 
    

    Output:

    [5, 6, 4, 1]
    

    Let's look at the original dataframe with results

     print(df[df['ID'].isin(obid)])
    

    Output:

       ID  key  acres
    0   1  156     10
    3   4  159      1
    4   5  160      9
    5   6  161    110
    
    0 讨论(0)
提交回复
热议问题