Pandas - Conditional Probability of a given specific b

后端 未结 5 1084
孤独总比滥情好
孤独总比滥情好 2021-01-03 02:32

I have DataFrame with two columns of \"a\" and \"b\". How can I find the conditional probability of \"a\" given specific \"b\"?

df.groupby(\'a\').groupby(\         


        
5条回答
  •  执笔经年
    2021-01-03 03:16

    Consider the DataFrame that Maxymoo suggested:

    df = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B':['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C':np.random.randn(8), 'D':np.random.randn(8)})
    
    df
         A      B         C         D
    0  foo    one  0.229206 -1.899999
    1  bar    one  0.174972  0.328746
    2  foo    two -1.384699 -1.691151
    3  bar  three -1.008328 -0.915467
    4  foo    two -0.065298 -0.107240
    5  bar    two  1.871916  0.798135
    6  foo    one  1.589609 -1.682237
    7  foo  three  2.292783  0.639595
    

    Lets assume that we are interested to calculate the probability of (y = foo) given x = one: P(y=foo|x=one) = ?

    Approach 1:

    df.groupby('B')['A'].value_counts()/df.groupby('B')['A'].count()
    B         
    one    foo    0.666667
           bar    0.333333
    three  foo    0.500000
           bar    0.500000
    two    foo    0.666667
           bar    0.333333
    dtype: float64
    

    So the answer is: 0.6667

    Approach 2:

    Probability of x = one: 0.375

    df['B'].value_counts()/df['B'].count()
    one      0.375
    two      0.375
    three    0.250
    dtype: float64
    

    Probability of y = foo: 0.625

    df['A'].value_counts()/df['A'].count()
    foo    0.625
    bar    0.375
    dtype: float64
    

    Probability of (x=one|y=foo): 0.4

    df.groupby('A')['B'].value_counts()/df.groupby('A')['B'].count()
    A         
    bar  one      0.333333
         two      0.333333
         three    0.333333
    foo  one      0.400000
         two      0.400000
         three    0.200000
    dtype: float64
    

    Therefore: P(y=foo|x=one) = P(x=one|y=foo)*P(y=foo)/P(x=one) = 0.4 * 0.625 / 0.375 = 0.6667

提交回复
热议问题