Pandas - Conditional Probability of a given specific b

后端未结

关注

 5  1084

孤独总比滥情好 2021-01-03 02:32

I have DataFrame with two columns of \"a\" and \"b\". How can I find the conditional probability of \"a\" given specific \"b\"?

df.groupby(\'a\').groupby(\


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   执笔经年
                                             
                
                
                (楼主)
            
              
              
                2021-01-03 03:16
              

            
            
                        
Consider the DataFrame that Maxymoo suggested:

df = pd.DataFrame({'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], 'B':['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C':np.random.randn(8), 'D':np.random.randn(8)})

df
     A      B         C         D
0  foo    one  0.229206 -1.899999
1  bar    one  0.174972  0.328746
2  foo    two -1.384699 -1.691151
3  bar  three -1.008328 -0.915467
4  foo    two -0.065298 -0.107240
5  bar    two  1.871916  0.798135
6  foo    one  1.589609 -1.682237
7  foo  three  2.292783  0.639595


Lets assume that we are interested to calculate the probability of (y = foo) given x = one: P(y=foo|x=one) = ?

Approach 1: 

df.groupby('B')['A'].value_counts()/df.groupby('B')['A'].count()
B         
one    foo    0.666667
       bar    0.333333
three  foo    0.500000
       bar    0.500000
two    foo    0.666667
       bar    0.333333
dtype: float64


So the answer is: 0.6667

Approach 2:

Probability of x = one: 0.375

df['B'].value_counts()/df['B'].count()
one      0.375
two      0.375
three    0.250
dtype: float64


Probability of y = foo: 0.625

df['A'].value_counts()/df['A'].count()
foo    0.625
bar    0.375
dtype: float64


Probability of (x=one|y=foo): 0.4

df.groupby('A')['B'].value_counts()/df.groupby('A')['B'].count()
A         
bar  one      0.333333
     two      0.333333
     three    0.333333
foo  one      0.400000
     two      0.400000
     three    0.200000
dtype: float64


Therefore: P(y=foo|x=one) = P(x=one|y=foo)*P(y=foo)/P(x=one) = 0.4 * 0.625 / 0.375 = 0.6667
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复