What is the best way to get a random sample of the elements of a groupby
? As I understand it, a groupby
is just an iterable over groups.
T
You can take a randoms sample of the unique values of df.some_key.unique()
, use that to slice the df
and finally groupby
on the resultant:
In [337]:
df = pd.DataFrame({'some_key': [0,1,2,3,0,1,2,3,0,1,2,3],
'val': [1,2,3,4,1,5,1,5,1,6,7,8]})
In [338]:
print df[df.some_key.isin(random.sample(df.some_key.unique(),2))].groupby('some_key').mean()
val
some_key
0 1.000000
2 3.666667
If there are more than one groupby keys:
In [358]:
df = pd.DataFrame({'some_key1':[0,1,2,3,0,1,2,3,0,1,2,3],
'some_key2':[0,0,0,0,1,1,1,1,2,2,2,2],
'val': [1,2,3,4,1,5,1,5,1,6,7,8]})
In [359]:
gby = df.groupby(['some_key1', 'some_key2'])
In [360]:
print gby.mean().ix[random.sample(gby.indices.keys(),2)]
val
some_key1 some_key2
1 1 5
3 2 8
But if you are just going to get the values of each group, you don't even need to groubpy
, MultiIndex
will do:
In [372]:
idx = random.sample(set(pd.MultiIndex.from_product((df.some_key1, df.some_key2)).tolist()),
2)
print df.set_index(['some_key1', 'some_key2']).ix[idx]
val
some_key1 some_key2
2 0 3
3 1 5