问题
Initial problem statement
Using pandas, I would like to apply function available for resample() but not for rolling().
This works:
df1 = df.resample(to_freq,
closed='left',
kind='period',
).agg(OrderedDict([('Open', 'first'),
('Close', 'last'),
]))
This doesn't:
df2 = df.rolling(my_indexer).agg(
OrderedDict([('Open', 'first'),
('Close', 'last') ]))
>>> AttributeError: 'first' is not a valid function for 'Rolling' object
df3 = df.rolling(my_indexer).agg(
OrderedDict([
('Close', 'last') ]))
>>> AttributeError: 'last' is not a valid function for 'Rolling' object
What would be your advice to keep first and last value of a rolling windows to be put into two different columns?
EDIT 1 - with usable input data
import pandas as pd
from random import seed
from random import randint
from collections import OrderedDict
# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0,10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)
# First & last work with resample
resampled_first = df.resample('3H',
closed='left',
kind='period',
).agg(OrderedDict([('Values', 'first')]))
resampled_last = df.resample('3H',
closed='left',
kind='period',
).agg(OrderedDict([('Values', 'last')]))
# They don't with rolling
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'first')]))
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'last')]))
Thanks for your help! Bests,
回答1:
You can use own function to get first or last element in rolling window
rolling_first = df.rolling(3).agg(lambda rows: rows[0])
rolling_last = df.rolling(3).agg(lambda rows: rows[-1])
Example
import pandas as pd
from random import seed, randint
# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0, 10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)
df['first'] = df['Values'].rolling(3).agg(lambda rows: rows[0])
df['last'] = df['Values'].rolling(3).agg(lambda rows: rows[-1])
print(df)
Result
Values first last
2020-01-01 00:00:00+00:00 2 NaN NaN
2020-01-01 01:00:00+00:00 9 NaN NaN
2020-01-01 02:00:00+00:00 1 2.0 1.0
2020-01-01 03:00:00+00:00 4 9.0 4.0
2020-01-01 04:00:00+00:00 1 1.0 1.0
2020-01-01 05:00:00+00:00 7 4.0 7.0
2020-01-01 06:00:00+00:00 7 1.0 7.0
2020-01-01 07:00:00+00:00 7 7.0 7.0
2020-01-01 08:00:00+00:00 10 7.0 10.0
2020-01-01 09:00:00+00:00 6 7.0 6.0
2020-01-01 10:00:00+00:00 3 10.0 3.0
2020-01-01 11:00:00+00:00 1 6.0 1.0
2020-01-01 12:00:00+00:00 7 3.0 7.0
2020-01-01 13:00:00+00:00 0 1.0 0.0
2020-01-01 14:00:00+00:00 6 7.0 6.0
2020-01-01 15:00:00+00:00 6 0.0 6.0
2020-01-01 16:00:00+00:00 9 6.0 9.0
2020-01-01 17:00:00+00:00 0 6.0 0.0
2020-01-01 18:00:00+00:00 7 9.0 7.0
2020-01-01 19:00:00+00:00 4 0.0 4.0
2020-01-01 20:00:00+00:00 3 7.0 3.0
2020-01-01 21:00:00+00:00 9 4.0 9.0
2020-01-01 22:00:00+00:00 1 3.0 1.0
2020-01-01 23:00:00+00:00 5 9.0 5.0
2020-01-02 00:00:00+00:00 0 1.0 0.0
EDIT:
Using dictionary you have to put directly lambda
, not string
result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)
The same with own function - you have to put its name, not string with name
def first(rows):
return rows[0]
def last(rows):
return rows[-1]
result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)
Example
import pandas as pd
from random import seed, randint
# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0, 10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)
result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)
def first(rows):
return rows[0]
def mylast(rows):
return rows[-1]
result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)
来源:https://stackoverflow.com/questions/60940098/taking-first-and-last-value-in-a-rolling-window