Say that I have a dataframe that looks like:
Name Group_Id
AAA 1
ABC 1
CCC 2
XYZ 2
DEF 3
YYH 3
How could I randomly select one (or m
A very pandas-ish way:
takesamp = lambda d: d.sample(n)
df = df.groupby('Group_Id').apply(takesamp)
Using groupby and random.choice in an elegant one liner:
df.groupby('Group_Id').apply(lambda x :x.iloc[random.choice(range(0,len(x)))])
The solutions offered fail if a group has fewer samples than the desired sample size n
. This addresses this problem:
n = 10
df.groupby('Group_Id').apply(lambda x: x.sample(min(n,len(x)))).reset_index(drop=True)