问题
I'm looking for the time complexity of these methods as a function of the number of rows in a dataframe, n.
Another way of asking this question is: Are indexes for dataframes in pandas btrees (with log(n) time look ups) or hash tables (with constant time lookups)?
Asking this question because I'd like a way to do constant time look ups for rows in a dataframe based on a custom index.
回答1:
Alright so it would appear that:
1) You can build your own index on a dataframe with .set_index in O(n) time where n is the number of rows in the dataframe
2) The index is lazily initialized and built (in O(n) time) the first time you try to access a row using that index. So accessing a row for the first time using that index takes O(n) time
3) All subsequent row access takes constant time.
So it looks like the indexes are hash tables and not btrees.
来源:https://stackoverflow.com/questions/58876676/what-is-the-time-complexity-of-at-and-loc-in-pandas