问题
I have a data frame as such
Customer Day
0. A 1
1. A 1
2. A 1
3. A 2
4. B 3
5. B 4
and I want to sample from it but I want to sample different sizes for each customer. I have the size of each customer in another dataframe. For example,
Customer Day
0. A 2
1. B 1
Suppose I want to sample per customer per day. So far I have this function:
def sampling(frame,a):
return np.random.choice(frame.Id,size=a)
grouped = frame.groupby(['Customer','Day'])
sampled = grouped.apply(sampling, a=??).reset_index()
If I set the size parameter to a global constant, no problem it runs. But I don't know how to set this when the different values are on a separate dataframe.
回答1:
You can create a mapper from the df1 with sample size and use that value as sample size,
mapper = df1.set_index('Customer')['Day'].to_dict()
df.groupby('Customer', as_index=False).apply(lambda x: x.sample(n = mapper[x.name]))
Customer Day
0 3 A 2
2 A 1
1 4 B 3
This returns multi-index, you can always reset_index,
df.groupby('Customer').apply(lambda x: x.sample(n = mapper[x.name])).reset_index(drop = True)
Customer Day
0 A 1
1 A 1
2 B 3
来源:https://stackoverflow.com/questions/58794340/sample-with-different-sample-sizes-per-customer