Turn Pandas DataFrame of strings into histogram

后端未结

关注

 3  1537

Suppose I have a DataFrame of created like this:

import pandas as pd
s1 = pd.Series([\'a\', \'b\', \'a\', \'c\', \'a\', \'b\'])
s2 = pd.Series([\'a\', \'f\',


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-12-31 03:47
              
            
            
                                                                       
Recreating the dataframe: 

import pandas as pd
s1 = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
s2 = pd.Series(['a', 'f', 'a', 'd', 'a', 'f', 'f'])
d = pd.DataFrame({'s1': s1, 's2': s2})


To get the histogram with subplots as desired: 

d.apply(pd.value_counts).plot(kind='bar', subplots=True)




The OP mentioned pd.value_counts in the question. I think the missing piece is just that there is no reason to "manually" create the desired bar plot. 

The output from d.apply(pd.value_counts) is a pandas dataframe. We can plot the values like any other dataframe, and selecting the option subplots=True gives us what we want. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-12-31 03:55
              
            
            
                                                                       
You can use pd.value_counts (value_counts is also a series method):

In [20]: d.apply(pd.value_counts)
Out[20]: 
   s1  s2
a   3   3
b   2 NaN
c   1 NaN
d NaN   1
f NaN   3


and than plot the resulting DataFrame.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2020-12-31 04:04
              
            
            
                                                                       
I would shove the Series into a collections.Counter (documentation) (You might need to convert it to a list first).  I am not a pandas expert, but I think you should be able to fold the Counter object back into a Series, indexed by the strings, and use that to make your plots.

This is not working because it is (rightly) raising errors when it tries to guess where the bin edges should be, which simply makes no sense with strings.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复