Why is `df.lookup` slower than `df.min`?

前端 未结 1 1964
萌比男神i
萌比男神i 2021-01-23 00:45

I wanted to cut some time by using lookup after idxmin, instead of calling min and idxmin. In my head, the first should be mo

1条回答
  •  醉梦人生
    2021-01-23 01:39

    If you look at the source code implementation of lookup function, it does not look to be very efficient. The source code can be found here:

    http://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/frame.py#L3435-L3484

    Particularly, in the main if-else condition body, it does

    if not self._is_mixed_type or n > thresh:
            values = self.values
            ridx = self.index.get_indexer(row_labels)
            cidx = self.columns.get_indexer(col_labels)
            if (ridx == -1).any():
                raise KeyError('One or more row labels was not found')
            if (cidx == -1).any():
                raise KeyError('One or more column labels was not found')
            flat_index = ridx * len(self.columns) + cidx
            result = values.flat[flat_index]
    
    result = np.empty(n, dtype='O')
    for i, (r, c) in enumerate(zip(row_labels, col_labels)):
            result[i] = self._get_value(r, c)
    

    I am not sure about detailed implementation of if case, but you might want to try this on a very large number of rows and very large number of column cases and you might get some better results off of lookup function.

    You probably should try to define your own lookup table so you'd know exactly the runtime rather than using this lookup function

    0 讨论(0)
提交回复
热议问题