发表新帖

发表新帖

Why is `df.lookup` slower than `df.min`?

前端未结

关注

 1  1964

萌比男神i 2021-01-23 00:45

I wanted to cut some time by using lookup after idxmin, instead of calling min and idxmin. In my head, the first should be mo

1条回答

醉梦人生 (楼主)

2021-01-23 01:39
If you look at the source code implementation of lookup function, it does not look to be very efficient. The source code can be found here:

http://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/core/frame.py#L3435-L3484

Particularly, in the main if-else condition body, it does
```
if not self._is_mixed_type or n > thresh:
        values = self.values
        ridx = self.index.get_indexer(row_labels)
        cidx = self.columns.get_indexer(col_labels)
        if (ridx == -1).any():
            raise KeyError('One or more row labels was not found')
        if (cidx == -1).any():
            raise KeyError('One or more column labels was not found')
        flat_index = ridx * len(self.columns) + cidx
        result = values.flat[flat_index]

result = np.empty(n, dtype='O')
for i, (r, c) in enumerate(zip(row_labels, col_labels)):
        result[i] = self._get_value(r, c)
```
I am not sure about detailed implementation of if case, but you might want to try this on a very large number of rows and very large number of column cases and you might get some better results off of lookup function.

You probably should try to define your own lookup table so you'd know exactly the runtime rather than using this lookup function
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题