I have DataFrame with two columns of \"a\" and \"b\". How can I find the conditional probability of \"a\" given specific \"b\"?
df.groupby(\'a\').groupby(\
Consider the DataFrame that Maxymoo suggested:
df = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B':['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C':np.random.randn(8), 'D':np.random.randn(8)})
df
A B C D
0 foo one 0.229206 -1.899999
1 bar one 0.174972 0.328746
2 foo two -1.384699 -1.691151
3 bar three -1.008328 -0.915467
4 foo two -0.065298 -0.107240
5 bar two 1.871916 0.798135
6 foo one 1.589609 -1.682237
7 foo three 2.292783 0.639595
Lets assume that we are interested to calculate the probability of (y = foo) given x = one: P(y=foo|x=one) = ?
Approach 1:
df.groupby('B')['A'].value_counts()/df.groupby('B')['A'].count()
B
one foo 0.666667
bar 0.333333
three foo 0.500000
bar 0.500000
two foo 0.666667
bar 0.333333
dtype: float64
So the answer is: 0.6667
Approach 2:
Probability of x = one: 0.375
df['B'].value_counts()/df['B'].count()
one 0.375
two 0.375
three 0.250
dtype: float64
Probability of y = foo: 0.625
df['A'].value_counts()/df['A'].count()
foo 0.625
bar 0.375
dtype: float64
Probability of (x=one|y=foo): 0.4
df.groupby('A')['B'].value_counts()/df.groupby('A')['B'].count()
A
bar one 0.333333
two 0.333333
three 0.333333
foo one 0.400000
two 0.400000
three 0.200000
dtype: float64
Therefore: P(y=foo|x=one) = P(x=one|y=foo)*P(y=foo)/P(x=one) = 0.4 * 0.625 / 0.375 = 0.6667