What is the meaning of “axis” attribute in a Pandas DataFrame?

前端 未结 5 1923
遇见更好的自我
遇见更好的自我 2021-02-04 02:28

Taking the following example:

>>> df1 = pd.DataFrame({\"x\":[1, 2, 3, 4, 5], 
                        \"y\":[3, 4, 5, 6, 7]}, 
                      ind         


        
相关标签:
5条回答
  • 2021-02-04 03:08

    First, OP misunderstood the rows and columns in his/her dataframe.

    But the acutal output considers rows that are found in both dataframes.(the only common row element 'y')

    OP thought the label y is for row. However, y is a column name.

    df1 = pd.DataFrame(
             {"x":[1, 2, 3, 4, 5],  # <-- looks like row x but actually col x
              "y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
              index=['a', 'b', 'c', 'd', 'e'])
    print(df1)
    
                \col   x    y
     index or row\
              a       1     3   |   a
              b       2     4   v   x
              c       3     5   r   i
              d       4     6   o   s
              e       5     7   w   0
    
                   -> column
                     a x i s 1
    

    It is very easy to be misled since in the dictionary, it looks like y and x are two rows.

    If you generate df1 from a list of list, it should be more intuitive:

    df1 = pd.DataFrame([[1,3], 
                        [2,4],
                        [3,5],
                        [4,6],
                        [5,7]],
                        index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])
    

    So back to the problem, concat is a shorthand for concatenate (means to link together in a series or chain on this way [source]) Performing concat along axis 0 means to linking two objects along axis 0.

       1
       1   <-- series 1
       1
    ^  ^  ^
    |  |  |               1
    c  a  a               1
    o  l  x               1
    n  o  i   gives you   2
    c  n  s               2
    a  g  0               2
    t  |  |
    |  V  V
    v 
       2
       2   <--- series 2
       2
    

    So... think you have the feeling now. What about sum function in pandas? What does sum(axis=0) means?

    Suppose data looks like

       1 2
       1 2
       1 2
    

    Maybe...summing along axis 0, you may guess. Yes!!

    ^  ^  ^
    |  |  |               
    s  a  a               
    u  l  x                
    m  o  i   gives you two values 3 6 !
    |  n  s               
    v  g  0               
       |  |
       V  V
    

    What about dropna? Suppose you have data

       1  2  NaN
      NaN 3   5
       2  4   6
    

    and you only want to keep

    2
    3
    4
    

    On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing

    Should you put dropna(axis=0) or dropna(axis=1)? Think about it and try it out with

    df = pd.DataFrame([[1, 2, np.nan],
                       [np.nan, 3, 5],
                       [2, 4, 6]])
    
    # df.dropna(axis=0) or df.dropna(axis=1) ?
    

    Hint: think about the word along.

    0 讨论(0)
  • 2021-02-04 03:22

    This is my trick with axis: just add the operation in your mind to make it sound clear:

    • axis 0 = rows
    • axis 1 = columns

    If you “sum” through axis=0, you are summing all rows, and the output will be a single row with the same number of columns. If you “sum” through axis=1, you are summing all columns, and the output will be a single column with the same number of rows.

    0 讨论(0)
  • 2021-02-04 03:24

    Data:

    In [55]: df1
    Out[55]:
       x  y
    a  1  3
    b  2  4
    c  3  5
    d  4  6
    e  5  7
    
    In [56]: df2
    Out[56]:
       y  z
    b  1  9
    c  3  8
    d  5  7
    e  7  6
    f  9  5
    

    Concatenated horizontally (axis=1), using index elements found in both DFs (aligned by indexes for joining):

    In [57]: pd.concat([df1, df2], join='inner', axis=1)
    Out[57]:
       x  y  y  z
    b  2  4  1  9
    c  3  5  3  8
    d  4  6  5  7
    e  5  7  7  6
    

    Concatenated vertically (DEFAULT: axis=0), using columns found in both DFs:

    In [58]: pd.concat([df1, df2], join='inner')
    Out[58]:
       y
    a  3
    b  4
    c  5
    d  6
    e  7
    b  1
    c  3
    d  5
    e  7
    f  9
    

    If you don't use the inner join method - you will have it this way:

    In [62]: pd.concat([df1, df2])
    Out[62]:
         x  y    z
    a  1.0  3  NaN
    b  2.0  4  NaN
    c  3.0  5  NaN
    d  4.0  6  NaN
    e  5.0  7  NaN
    b  NaN  1  9.0
    c  NaN  3  8.0
    d  NaN  5  7.0
    e  NaN  7  6.0
    f  NaN  9  5.0
    
    In [63]: pd.concat([df1, df2], axis=1)
    Out[63]:
         x    y    y    z
    a  1.0  3.0  NaN  NaN
    b  2.0  4.0  1.0  9.0
    c  3.0  5.0  3.0  8.0
    d  4.0  6.0  5.0  7.0
    e  5.0  7.0  7.0  6.0
    f  NaN  NaN  9.0  5.0
    
    0 讨论(0)
  • 2021-02-04 03:24

    Interpret axis=0 to apply the algorithm down each column, or to the row labels (the index).. A more detailed schema here.

    If you apply that general interpretation to your case, the algorithm here is concat. Thus for axis=0, it means:

    for each column, take all the rows down (across all the dataframes for concat) , and do contact them when they are in common (because you selected join=inner).

    So the meaning would be to take all columns x and concat them down the rows which would stack each chunk of rows one after another. However, here x is not present everywhere, so it is not kept for the final result. The same applies for z. For y the result is kept as y is in all dataframes. This is the result you have.

    0 讨论(0)
  • 2021-02-04 03:28

    If someone needs visual description, here is the image:

    0 讨论(0)
提交回复
热议问题