I\'ve noticed that there are several uses of pd.DataFrame.groupby followed by an apply
implicitly assuming that groupby
is stable - that is, if
Although the docs don't state this internally, it uses stable sort when generating the groups.
See:
As I mentioned in the comments, this is important if you consider transform
which will return a Series with it's index aligned to the original df. If the sorting didn't preserve the order, then this would make alignment perform additional work as it would need to sort the Series prior to assigning. In fact, this is mentioned in the comments:
_algos.groupsort_indexer
implements counting sort and it is at leastO(ngroups)
, where
ngroups = prod(shape)
shape = map(len, keys)
That is, linear in the number of combinations (cartesian product) of unique values of groupby keys. This can be huge when doing multi-key groupby.
np.argsort(kind='mergesort')
isO(count x log(count))
where count is the length of the data-frame; Both algorithms are stable sort and that is necessary for correctness of groupby operations.e.g. consider:
df.groupby(key)[col].transform('first')