Python: Random selection per group

2020-12-01 05:08

Say that I have a dataframe that looks like:

Name Group_Id
AAA  1
ABC  1
CCC  2
XYZ  2
DEF  3 
YYH  3

How could I randomly select one (or m

  • 2020-12-01 05:34

    for randomly selecting just one row per group try df.sample(frac = 1.0).groupby('Group_Id').head(1)

  • 2020-12-01 05:37

    There are two ways to do this very simply, one without using anything except basic pandas syntax:


    This takes 14.4ms with 50k row dataset.

    The other, slightly faster method, involves numpy.


    This takes 10.9ms with (the same) 50k row dataset.

    Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.

  • 2020-12-01 05:40

    You can use a combination of pandas.groupby, pandas.concat and random.sample:

    import pandas as pd
    import random
    df = pd.DataFrame({
            'Name': ['AAA', 'ABC', 'CCC', 'XYZ', 'DEF', 'YYH'],
            'Group_ID': [1,1,2,2,3,3]
    grouped = df.groupby('Group_ID')
    df_sampled = pd.concat([d.ix[random.sample(d.index, 1)] for _, d in grouped]).reset_index(drop=True)
    print df_sampled


       Group_ID Name
    0         1  AAA
    1         2  XYZ
    2         3  DEF
  • 2020-12-01 05:49
    size = 2        # sample size
    replace = True  # with replacement
    fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
    df.groupby('Group_Id', as_index=False).apply(fn)
  • 2020-12-01 05:51

    From 0.16.x onwards pd.DataFrame.sample provides a way to return a random sample of items from an axis of object.

    In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True)
      Name  Group_Id
    0  ABC         1
    1  XYZ         2
    2  DEF         3
  • 2020-12-01 05:54

    Using random.choice, you can do something like this:

    import random
    name_group = {'AAA': 1, 'ABC':1, 'CCC':2, 'XYZ':2, 'DEF':3, 'YYH':3}
    names = [name for name in name_group.iterkeys()] #create a list out of the keys in the name_group dict
    first_name = random.choice(names)
    first_group = name_group[first_name]
    print first_name, first_group


    Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
