Python Pandas Choosing Random Sample of Groups from Groupby

前端 未结 1 1406
说谎
说谎 2020-12-31 13:31

What is the best way to get a random sample of the elements of a groupby? As I understand it, a groupby is just an iterable over groups.

T

相关标签:
1条回答
  • 2020-12-31 13:56

    You can take a randoms sample of the unique values of df.some_key.unique(), use that to slice the df and finally groupby on the resultant:

    In [337]:
    
    df = pd.DataFrame({'some_key': [0,1,2,3,0,1,2,3,0,1,2,3],
                       'val':      [1,2,3,4,1,5,1,5,1,6,7,8]})
    In [338]:
    
    print df[df.some_key.isin(random.sample(df.some_key.unique(),2))].groupby('some_key').mean()
                   val
    some_key          
    0         1.000000
    2         3.666667
    

    If there are more than one groupby keys:

    In [358]:
    
    df = pd.DataFrame({'some_key1':[0,1,2,3,0,1,2,3,0,1,2,3],
                       'some_key2':[0,0,0,0,1,1,1,1,2,2,2,2],
                       'val':      [1,2,3,4,1,5,1,5,1,6,7,8]})
    In [359]:
    
    gby = df.groupby(['some_key1', 'some_key2'])
    In [360]:
    
    print gby.mean().ix[random.sample(gby.indices.keys(),2)]
                         val
    some_key1 some_key2     
    1         1            5
    3         2            8
    

    But if you are just going to get the values of each group, you don't even need to groubpy, MultiIndex will do:

    In [372]:
    
    idx = random.sample(set(pd.MultiIndex.from_product((df.some_key1, df.some_key2)).tolist()),
                        2)
    print df.set_index(['some_key1', 'some_key2']).ix[idx]
                         val
    some_key1 some_key2     
    2         0            3
    3         1            5
    
    0 讨论(0)
提交回复
热议问题