Vectorized lookup on a pandas dataframe

后端 未结 3 614
不知归路
不知归路 2020-11-27 23:29

I have two DataFrames . . .

df1 is a table I need to pull values from using index, column pairs retrieved from multiple columns in df2.

I see t

相关标签:
3条回答
  • 2020-11-28 00:19

    There's a function aptly named lookup that does exactly this.

    df2['looked_up'] = df1.lookup(df2.animal, df2.letter)
    
    df2
    
        0   1   2   3   4 animal letter  looked_up
    0   0   1   2   3   4    cat      a          0
    1   5   6   7   8   9    dog      b          6
    2  10  11  12  13  14   fish      c         12
    3  15  16  17  18  19   bird      d         18
    
    0 讨论(0)
  • If looking for a bit faster approach then zip will help in case of small dataframe i.e

    k = list(zip(df2['animal'].values,df2['letter'].values))
    df2['looked_up'] = [df1.get_value(*i) for i in k]
    

    Output:

       0   1   2   3   4 animal letter  looked_up
    0   0   1   2   3   4    cat      a          0
    1   5   6   7   8   9    dog      b          6
    2  10  11  12  13  14   fish      c         12
    3  15  16  17  18  19   bird      d         18
    

    As John suggested you can simplify the code which will be much faster.

     df2['looked_up'] = [df1.get_value(r, c) for r, c in zip(df2.animal, df2.letter)]
    

    In case of missing data use if else i.e

    df2['looked_up'] = [df1.get_value(r, c) if not pd.isnull(c) | pd.isnull(r) else pd.np.nan for r, c in zip(df2.animal, df2.letter) ]
    

    For small dataframes

    %%timeit
    df2['looked_up'] = df1.lookup(df2.animal, df2.letter)
    1000 loops, best of 3: 801 µs per loop
    
    k = list(zip(df2['animal'].values,df2['letter'].values))
    df2['looked_up'] = [df1.get_value(*i) for i in k]
    1000 loops, best of 3: 399 µs per loop
    
    [df1.get_value(r, c) for r, c in zip(df2.animal, df2.letter)]
    10000 loops, best of 3: 87.5 µs per loop
    

    For large dataframe

    df3 = pd.concat([df2]*10000)
    
    %%timeit
    k = list(zip(df3['animal'].values,df3['letter'].values))
    df2['looked_up'] = [df1.get_value(*i) for i in k]
    1 loop, best of 3: 185 ms per loop
    
    
    df2['looked_up'] = [df1.get_value(r, c) for r, c in zip(df3.animal, df3.letter)]
    1 loop, best of 3: 165 ms per loop
    
    df2['looked_up'] = df1.lookup(df3.animal, df3.letter)
    100 loops, best of 3: 8.82 ms per loop
    
    0 讨论(0)
  • 2020-11-28 00:32

    lookup and get_value are great answers if your values exist in lookup dataframe.

    However, if you've (row, column) pairs not present in the lookup dataframe, and want the lookup value be NaN -- merge and stack is one way to do it

    In [206]: df2.merge(df1.stack().reset_index().rename(columns={0: 'looked_up'}),
                        left_on=['animal', 'letter'], right_on=['level_0', 'level_1'],
                        how='left').drop(['level_0', 'level_1'], 1)
    Out[206]:
        0   1   2   3   4 animal letter  looked_up
    0   0   1   2   3   4    cat      a          0
    1   5   6   7   8   9    dog      b          6
    2  10  11  12  13  14   fish      c         12
    3  15  16  17  18  19   bird      d         18
    

    Test with adding non-existing (animal, letter) pair

    In [207]: df22
    Out[207]:
          0     1     2     3     4 animal letter
    0   0.0   1.0   2.0   3.0   4.0    cat      a
    1   5.0   6.0   7.0   8.0   9.0    dog      b
    2  10.0  11.0  12.0  13.0  14.0   fish      c
    3  15.0  16.0  17.0  18.0  19.0   bird      d
    4   NaN   NaN   NaN   NaN   NaN  dummy    NaN
    
    In [208]: df22.merge(df1.stack().reset_index().rename(columns={0: 'looked_up'}),
                        left_on=['animal', 'letter'], right_on=['level_0', 'level_1'],
                        how='left').drop(['level_0', 'level_1'], 1)
    Out[208]:
          0     1     2     3     4 animal letter  looked_up
    0   0.0   1.0   2.0   3.0   4.0    cat      a        0.0
    1   5.0   6.0   7.0   8.0   9.0    dog      b        6.0
    2  10.0  11.0  12.0  13.0  14.0   fish      c       12.0
    3  15.0  16.0  17.0  18.0  19.0   bird      d       18.0
    4   NaN   NaN   NaN   NaN   NaN  dummy    NaN        NaN
    
    0 讨论(0)
提交回复
热议问题