问题
I use a custom window function in pandas. This works fine for things like .mean()
and .sum()
. As I understand it other aggregations like .count()
and .min()
used to have problems but should be fixed. Currently .count()
uses the internal roll_count
function AFAICS. But I still don't get the expected results:
import numpy as np
import pandas as pd
# Use largest most recent multiple of *modulo* past measurements:
class ModuloIndexer(pd.api.indexers.BaseIndexer):
def get_window_bounds(self, num_values, min_periods, center, closed):
end = np.arange(1, num_values + 1, dtype=np.int64)
start = end % self.modulo
return start, end
s = pd.Series(2 ** np.arange(8)) # [1, 2, 4, 8, 16, 32, 64, 128]
r = s.rolling(ModuloIndexer(s.index, modulo=4))
print(r.sum()) # Correct: [0, 0, 0, 15, 30, 60, 120, 255]
print(r.apply(len)) # Correct: [0, 0, 0, 4, 4, 4, 4, 8]
print(r.count()) # Weird: [nan, nan, nan, 1, 1, 1, 1, 2]
print(r.apply(np.min)) # Correct: [nan, nan, nan, 1, 2, 4, 8, 1]
print(r.min()) # Weird: [nan, nan, nan, 8, 8, 8, 8, 8]
Am I doing something wrong or is this a bug I should report?
PS: use apply(len)
as a workaround only when no nan
s exist!
来源:https://stackoverflow.com/questions/64984049/baseindexer-still-broken-with-count-and-min