Python: Random selection per group

后端 未结 9 870
面向向阳花
面向向阳花 2020-12-01 05:08

Say that I have a dataframe that looks like:

Name Group_Id
AAA  1
ABC  1
CCC  2
XYZ  2
DEF  3 
YYH  3

How could I randomly select one (or m

相关标签:
9条回答
  • 2020-12-01 05:34

    for randomly selecting just one row per group try df.sample(frac = 1.0).groupby('Group_Id').head(1)

    0 讨论(0)
  • 2020-12-01 05:37

    There are two ways to do this very simply, one without using anything except basic pandas syntax:

    df[['x','y']].groupby('x').agg(pd.DataFrame.sample)
    

    This takes 14.4ms with 50k row dataset.

    The other, slightly faster method, involves numpy.

    df[['x','y']].groupby('x').agg(np.random.choice)
    

    This takes 10.9ms with (the same) 50k row dataset.

    Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.

    0 讨论(0)
  • 2020-12-01 05:40

    You can use a combination of pandas.groupby, pandas.concat and random.sample:

    import pandas as pd
    import random
    
    df = pd.DataFrame({
            'Name': ['AAA', 'ABC', 'CCC', 'XYZ', 'DEF', 'YYH'],
            'Group_ID': [1,1,2,2,3,3]
         })
    
    grouped = df.groupby('Group_ID')
    df_sampled = pd.concat([d.ix[random.sample(d.index, 1)] for _, d in grouped]).reset_index(drop=True)
    print df_sampled
    

    Output:

       Group_ID Name
    0         1  AAA
    1         2  XYZ
    2         3  DEF
    
    0 讨论(0)
  • 2020-12-01 05:49
    size = 2        # sample size
    replace = True  # with replacement
    fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
    df.groupby('Group_Id', as_index=False).apply(fn)
    
    0 讨论(0)
  • 2020-12-01 05:51

    From 0.16.x onwards pd.DataFrame.sample provides a way to return a random sample of items from an axis of object.

    In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True)
    Out[664]:
      Name  Group_Id
    0  ABC         1
    1  XYZ         2
    2  DEF         3
    
    0 讨论(0)
  • 2020-12-01 05:54

    Using random.choice, you can do something like this:

    import random
    name_group = {'AAA': 1, 'ABC':1, 'CCC':2, 'XYZ':2, 'DEF':3, 'YYH':3}
    
    names = [name for name in name_group.iterkeys()] #create a list out of the keys in the name_group dict
    
    first_name = random.choice(names)
    first_group = name_group[first_name]
    print first_name, first_group
    

    random.choice(seq)

    Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
    
    0 讨论(0)
提交回复
热议问题