What is the meaning of “axis” attribute in a Pandas DataFrame?

前端未结

关注

 5  1923

Taking the following example:

>>> df1 = pd.DataFrame({\"x\":[1, 2, 3, 4, 5], 
                        \"y\":[3, 4, 5, 6, 7]}, 
                      ind


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2021-02-04 03:08
              
            
            
                                                                       
First, OP misunderstood the rows and columns in his/her dataframe. 


  But the acutal output considers rows that are found in both dataframes.(the only common row element 'y') 


OP thought the label y is for row. However, y is a column name.

df1 = pd.DataFrame(
         {"x":[1, 2, 3, 4, 5],  # <-- looks like row x but actually col x
          "y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
          index=['a', 'b', 'c', 'd', 'e'])
print(df1)

            \col   x    y
 index or row\
          a       1     3   |   a
          b       2     4   v   x
          c       3     5   r   i
          d       4     6   o   s
          e       5     7   w   0

               -> column
                 a x i s 1


It is very easy to be misled since in the dictionary, it looks like y and x are two rows. 

If you generate df1 from a list of list, it should be more intuitive:

df1 = pd.DataFrame([[1,3], 
                    [2,4],
                    [3,5],
                    [4,6],
                    [5,7]],
                    index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])


So back to the problem, concat is a shorthand for concatenate (means to link together in a series or chain on this way [source]) Performing concat along axis 0 means to linking two objects along axis 0.

   1
   1   <-- series 1
   1
^  ^  ^
|  |  |               1
c  a  a               1
o  l  x               1
n  o  i   gives you   2
c  n  s               2
a  g  0               2
t  |  |
|  V  V
v 
   2
   2   <--- series 2
   2


So... think you have the feeling now. What about sum function in pandas? What does sum(axis=0) means? 

Suppose data looks like 

   1 2
   1 2
   1 2


Maybe...summing along axis 0, you may guess. Yes!! 

^  ^  ^
|  |  |               
s  a  a               
u  l  x                
m  o  i   gives you two values 3 6 !
|  n  s               
v  g  0               
   |  |
   V  V


What about dropna? Suppose you have data

   1  2  NaN
  NaN 3   5
   2  4   6


and you only want to keep

2
3
4


On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing 

Should you put dropna(axis=0) or dropna(axis=1)? Think about it and try it out with 

df = pd.DataFrame([[1, 2, np.nan],
                   [np.nan, 3, 5],
                   [2, 4, 6]])

# df.dropna(axis=0) or df.dropna(axis=1) ?


Hint: think about the word along. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2021-02-04 03:22
              
            
            
                                                                       
This is my trick with axis: just add the operation in your mind to make it sound clear:


axis 0 = rows
axis 1 = columns


If you “sum” through axis=0, you are summing all rows, and the output will be a single row with the same number of columns.
If you “sum” through axis=1, you are summing all columns, and the output will be a single column with the same number of rows.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2021-02-04 03:24
              
            
            
                                                                       
Data:

In [55]: df1
Out[55]:
   x  y
a  1  3
b  2  4
c  3  5
d  4  6
e  5  7

In [56]: df2
Out[56]:
   y  z
b  1  9
c  3  8
d  5  7
e  7  6
f  9  5


Concatenated horizontally (axis=1), using index elements found in both DFs (aligned by indexes for joining):

In [57]: pd.concat([df1, df2], join='inner', axis=1)
Out[57]:
   x  y  y  z
b  2  4  1  9
c  3  5  3  8
d  4  6  5  7
e  5  7  7  6


Concatenated vertically (DEFAULT: axis=0), using columns found in both DFs:

In [58]: pd.concat([df1, df2], join='inner')
Out[58]:
   y
a  3
b  4
c  5
d  6
e  7
b  1
c  3
d  5
e  7
f  9


If you don't use the inner join method - you will have it this way:

In [62]: pd.concat([df1, df2])
Out[62]:
     x  y    z
a  1.0  3  NaN
b  2.0  4  NaN
c  3.0  5  NaN
d  4.0  6  NaN
e  5.0  7  NaN
b  NaN  1  9.0
c  NaN  3  8.0
d  NaN  5  7.0
e  NaN  7  6.0
f  NaN  9  5.0

In [63]: pd.concat([df1, df2], axis=1)
Out[63]:
     x    y    y    z
a  1.0  3.0  NaN  NaN
b  2.0  4.0  1.0  9.0
c  3.0  5.0  3.0  8.0
d  4.0  6.0  5.0  7.0
e  5.0  7.0  7.0  6.0
f  NaN  NaN  9.0  5.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2021-02-04 03:24
              
            
            
                                                                       
Interpret axis=0 to apply the algorithm down each column, or to the row labels (the index).. A more detailed schema here.

If you apply that general interpretation to your case, the algorithm here is concat. Thus for axis=0, it means: 

for each column, take all the rows down (across all the dataframes for concat) , and do contact them when they are in common (because you selected join=inner). 

So the meaning would be to take all columns x and concat them down the rows which would stack each chunk of rows one after another. However, here x is not present everywhere, so it is not kept for the final result. The same applies for z. For y the result is kept as y is in all dataframes. This is the result you have.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2021-02-04 03:28
              
            
            
                                                                       
If someone needs visual description, here is the image:


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复