pandas .at versus .loc

后端 未结 4 1309
情歌与酒
情歌与酒 2020-11-27 15:20

I\'ve been exploring how to optimize my code and ran across pandas .at method. Per the documentation

Fast label-based scalar ac

相关标签:
4条回答
  • 2020-11-27 16:07

    Update: df.get_value is deprecated as of version 0.21.0. Using df.at or df.iat is the recommended method going forward.


    df.at can only access a single value at a time.

    df.loc can select multiple rows and/or columns.

    Note that there is also df.get_value, which may be even quicker at accessing single values:

    In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
    10000 loops, best of 3: 187 µs per loop
    
    In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
    100000 loops, best of 3: 8.33 µs per loop
    
    In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
    100000 loops, best of 3: 3.62 µs per loop
    

    Under the hood, df.at[...] calls df.get_value, but it also does some type checking on the keys.

    0 讨论(0)
  • 2020-11-27 16:16

    .at is an optimized data access method compared to .loc .

    .loc of a data frame selects all the elements located by indexed_rows and labeled_columns as given in its argument. Insetad, .at selects particular elemnt of a data frame positioned at the given indexed_row and labeled_column.

    Also, .at takes one row and one column as input argument, whereas .loc may take multiple rows and columns. Oputput using .at is a single element and using .loc maybe a Series or a DataFrame.

    0 讨论(0)
  • 2020-11-27 16:19

    As you asked about the limitations of .at, here is one thing I recently ran into (using pandas 0.22). Let's use the example from the documentation:

    df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]], index=[4, 5, 6], columns=['A', 'B', 'C'])
    df2 = df.copy()
    
        A   B   C
    4   0   2   3
    5   0   4   1
    6  10  20  30
    

    If I now do

    df.at[4, 'B'] = 100
    

    the result looks as expected

        A    B   C
    4   0  100   3
    5   0    4   1
    6  10   20  30
    

    However, when I try to do

     df.at[4, 'C'] = 10.05
    

    it seems that .at tries to conserve the datatype (here: int):

        A    B   C
    4   0  100  10
    5   0    4   1
    6  10   20  30
    

    That seems to be a difference to .loc:

    df2.loc[4, 'C'] = 10.05
    

    yields the desired

        A   B      C
    4   0   2  10.05
    5   0   4   1.00
    6  10  20  30.00
    

    The risky thing in the example above is that it happens silently (the conversion from float to int). When one tries the same with strings it will throw an error:

    df.at[5, 'A'] = 'a_string'
    

    ValueError: invalid literal for int() with base 10: 'a_string'

    It will work, however, if one uses a string on which int() actually works as noted by @n1k31t4 in the comments, e.g.

    df.at[5, 'A'] = '123'
    
         A   B   C
    4    0   2   3
    5  123   4   1
    6   10  20  30
    
    0 讨论(0)
  • 2020-11-27 16:19

    Adding to the above, Pandas documentation for the at function states:

    Access a single value for a row/column label pair.

    Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

    For setting data loc and at are similar, for example:

    df = pd.DataFrame({'A': [1,2,3], 'B': [11,22,33]}, index=[0,0,1])
    

    Both loc and at will produce the same result

    df.at[0, 'A'] = [101,102]
    df.loc[0, 'A'] = [101,102]
    
        A   B
    0   101 11
    0   102 22
    1   3   33
    
    df.at[0, 'A'] = 103
    df.loc[0, 'A'] = 103
    
        A   B
    0   103 11
    0   103 22
    1   3   33
    

    Also, for accessing a single value, both are the same

    df.loc[1, 'A']   # returns a single value (<class 'numpy.int64'>)
    df.at[1, 'A']    # returns a single value (<class 'numpy.int64'>)
    
    3
    

    However, when matching multiple values, loc will return a group of rows/cols from the DataFrame while at will return an array of values

    df.loc[0, 'A']  # returns a Series (<class 'pandas.core.series.Series'>)
    
    0    103
    0    103
    Name: A, dtype: int64
    
    df.at[0, 'A']   # returns array of values (<class 'numpy.ndarray'>)
    
    array([103, 103])
    

    And more so, loc can be used to match a group of row/cols and can be given only an index, while at must receive the column

    df.loc[0]  # returns a DataFrame view (<class 'pandas.core.frame.DataFrame'>)
    
        A   B
    0   103 11
    0   103 22
    
    
    # df.at[0]  # ERROR: must receive column
    
    0 讨论(0)
提交回复
热议问题