问题
I've got a dataframe which has probabilities for different events over a large number of sequential periods, and I want to transform this df to show the probability of something happening at least once over n periods. eg, I've got this, which would be n = 1:
event | period | probability
A | period 1 | 0.6
A | period 2 | 0.7
A | period 3 | 0.8
A | period 4 | 0.85
A | period 5 | 0.9
And I want to figure out the probability of A occurring at least once across two periods (n = 2), which would be:
A | period 1 | 1-(1-0.6)*(1-0.7)
A | period 2 | 1-(1-0.7)*(1-0.8)
A | period 3 | 1-(1-0.8)*(1-0.85)
A | period 4 | 1-(1-0.85)*(1-0.9)
And n = 3 would be:
A | period 1 | 1-(1-0.6)*(1-0.7)*(1-0.8)
A | period 2 | 1-(1-0.7)*(1-0.8)*(1-0.85)
A | period 3 | 1-(1-0.8)*(1-0.85)*(1-0.9)
Is there some python / pandas function or term that'd work here?
回答1:
You can use groupby with transform:
n = 2
df['new_probability'] = df.groupby('event')['probability'].transform(lambda x: x.rolling(n).agg(lambda y: 1-np.prod(1-y)).shift(-n+1))
print(df)
event period probability new_probability
A period1 0.60 0.880
A period2 0.70 0.940
A period3 0.80 0.970
A period4 0.85 0.985
A period5 0.90 NaN
For n=3
:
n = 3
df['new_probability'] = df.groupby('event')['probability'].transform(lambda x: x.rolling(n).agg(lambda y: 1-np.prod(1-y)).shift(-n+1))
print(df)
event period probability new_probability
A period1 0.60 0.976
A period2 0.70 0.991
A period3 0.80 0.997
A period4 0.85 NaN
A period5 0.90 NaN
来源:https://stackoverflow.com/questions/61188744/transforming-a-dataframe-of-probabilities-for-specific-periods-to-be-probabiliti