getting the index of a row in a pandas apply function

后端 未结 3 1623
灰色年华
灰色年华 2020-11-29 21:48

I am trying to access the index of a row in a function applied across an entire DataFrame in Pandas. I have something like this:

df = pandas.Dat         


        
相关标签:
3条回答
  • 2020-11-29 22:20

    To access the index in this case you access the name attribute:

    In [182]:
    
    df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
    def rowFunc(row):
        return row['a'] + row['b'] * row['c']
    
    def rowIndex(row):
        return row.name
    df['d'] = df.apply(rowFunc, axis=1)
    df['rowIndex'] = df.apply(rowIndex, axis=1)
    df
    Out[182]:
       a  b  c   d  rowIndex
    0  1  2  3   7         0
    1  4  5  6  34         1
    

    Note that if this is really what you are trying to do that the following works and is much faster:

    In [198]:
    
    df['d'] = df['a'] + df['b'] * df['c']
    df
    Out[198]:
       a  b  c   d
    0  1  2  3   7
    1  4  5  6  34
    
    In [199]:
    
    %timeit df['a'] + df['b'] * df['c']
    %timeit df.apply(rowIndex, axis=1)
    10000 loops, best of 3: 163 µs per loop
    1000 loops, best of 3: 286 µs per loop
    

    EDIT

    Looking at this question 3+ years later, you could just do:

    In[15]:
    df['d'],df['rowIndex'] = df['a'] + df['b'] * df['c'], df.index
    df
    
    Out[15]: 
       a  b  c   d  rowIndex
    0  1  2  3   7         0
    1  4  5  6  34         1
    

    but assuming it isn't as trivial as this, whatever your rowFunc is really doing, you should look to use the vectorised functions, and then use them against the df index:

    In[16]:
    df['newCol'] = df['a'] + df['b'] + df['c'] + df.index
    df
    
    Out[16]: 
       a  b  c   d  rowIndex  newCol
    0  1  2  3   7         0       6
    1  4  5  6  34         1      16
    
    0 讨论(0)
  • 2020-11-29 22:22

    To answer the original question: yes, you can access the index value of a row in apply(). It is available under the key name and requires that you specify axis=1 (because the lambda processes the columns of a row and not the rows of a column).

    Working example (pandas 0.23.4):

    >>> import pandas as pd
    >>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
    >>> df.set_index('a', inplace=True)
    >>> df
       b  c
    a      
    1  2  3
    4  5  6
    >>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
    >>> df
       b  c  index_x10
    a                 
    1  2  3         10
    4  5  6         40
    
    0 讨论(0)
  • 2020-11-29 22:38

    Either:

    1. with row.name inside the apply(..., axis=1) call:

    df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y'])
    
       a  b  c
    x  1  2  3
    y  4  5  6
    
    df.apply(lambda row: row.name, axis=1)
    
    x    x
    y    y
    

    2. with iterrows() (slower)

    DataFrame.iterrows() allows you to iterate over rows, and access their index:

    for idx, row in df.iterrows():
        ...
    
    0 讨论(0)
提交回复
热议问题