I wanted to cut some time by using lookup
after idxmin
, instead of calling min
and idxmin
. In my head, the first should be mo
If you look at the source code implementation of lookup function, it does not look to be very efficient. The source code can be found here:
http://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/frame.py#L3435-L3484
Particularly, in the main if-else condition body, it does
if not self._is_mixed_type or n > thresh:
values = self.values
ridx = self.index.get_indexer(row_labels)
cidx = self.columns.get_indexer(col_labels)
if (ridx == -1).any():
raise KeyError('One or more row labels was not found')
if (cidx == -1).any():
raise KeyError('One or more column labels was not found')
flat_index = ridx * len(self.columns) + cidx
result = values.flat[flat_index]
result = np.empty(n, dtype='O')
for i, (r, c) in enumerate(zip(row_labels, col_labels)):
result[i] = self._get_value(r, c)
I am not sure about detailed implementation of if case, but you might want to try this on a very large number of rows and very large number of column cases and you might get some better results off of lookup function.
You probably should try to define your own lookup table so you'd know exactly the runtime rather than using this lookup function