Assign control vs. treatment groupings randomly based on % for more than 2 groups

前端未结

关注

 2  2069

半阙折子戏

Piggy backing off my own previous question python pandas: assign control vs. treatment groupings randomly based on %

Thanks to @maxU, I know how to assign random co

相关标签:

2条回答

感情败类

2021-01-15 18:11

It sounds like you're looking for a way to split your customer_id's into exact proportions, and not rely on chance. Here's one way to do that using pandas.qcut and np.random.permutation.

In [228]: df = pd.DataFrame({'customer_id': np.random.normal(size=10000), 
                             'group': np.random.choice(['a', 'b', 'c'], size=10000)})

In [229]: proportions = {'a':[.5,.5], 'b':[.4,.6], 'c':[.2,.8]}

In [230]: df.head()
Out[230]:
   customer_id group
0       0.6547     c
1       1.4190     a
2       0.4205     a
3       2.3266     a
4      -0.5691     b

In [231]: def assigner(gp):
     ...:     group = gp['group'].iloc[0]
     ...:     cut = pd.qcut(
                  np.arange(gp.shape[0]), 
                  q=np.cumsum([0] + proportions[group]), 
                  labels=range(len(proportions[group]))
              ).get_values()
     ...:     return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='assignment')
     ...:

In [232]: df['assignment'] = df.groupby('group', group_keys=False).apply(assigner)

In [233]: df.head()
Out[233]:
   customer_id group  assignment
0       0.6547     c           1
1       1.4190     a           1
2       0.4205     a           0
3       2.3266     a           1
4      -0.5691     b           0

In [234]: (df.groupby(['group', 'assignment'])
             .size()
             .unstack()
             .assign(proportion=lambda x: x[0] / (x[0] + x[1])))
Out[234]:
assignment     0     1  proportion
group
a           1659  1658      0.5002
b           1335  2003      0.3999
c            669  2676      0.2000

What's going on here?

Within each group we call the function assigner
assigner grabs the group name and proportions from the predefined dictionary and calls pd.qcut to split into 0(control) 1(treatment)
np.random.permutation then shuffles the the assignments
Create this as a new column in the original dataframe

0 讨论(0)

孤独总比滥情好

2021-01-15 18:12

In [13]: df
Out[13]:
  customer_id  Group
0         ABC      1
1         CDE      3
2         BHF      2
3         NID      1
4         WKL      3
5         SDI      2
6         JSK      1
7         OSM      3
8         MPA      2
9         MAD      1

In [14]: d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}

In [15]: df['Flag'] = \
    ...: df.groupby('Group')['customer_id'] \
    ...:   .transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
    ...:

In [16]: df
Out[16]:
  customer_id  Group     Flag
0         ABC      1  Control
1         CDE      3     Test
2         BHF      2     Test
3         NID      1  Control
4         WKL      3  Control
5         SDI      2     Test
6         JSK      1     Test
7         OSM      3     Test
8         MPA      2  Control
9         MAD      1     Test

0 讨论(0)