问题
i have a dataframe like this:
df.head()
Out[2]:
price sale_date
0 477,000,000 1396/10/30
1 608,700,000 1396/10/30
2 580,000,000 1396/10/03
3 350,000,000 1396/10/03
4 328,000,000 1396/03/18
that it has out of bounds datetime
so then i follow below to make them as period time
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].str.replace('/','').astype(int).apply(conv)
now i want to resample them by day like below:
df.resample(freq='d', on='sale_date').sum()
but it gives me this error:
resample() got an unexpected keyword argument 'freq'
回答1:
It seems here not working resample
and Grouper
with Periods
for me in pandas 1.1.3 (I guess bug):
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00
# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00
Possible solution is aggregate by sum
, so if duplicated sale_date
then price
values are summed:
df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
sale_date price
0 1396-03-18 328000000
1 1396-10-03 580000000
2 1396-10-30 477000000
3 1396-11-25 608700000
4 1396-12-05 350000000
EDIT: It is possible by Series.reindex with period_range:
s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
df = s.reindex(rng, fill_value=0).reset_index()
print (df)
sale_date price
0 1396-03-18 328000000
1 1396-03-19 0
2 1396-03-20 0
3 1396-03-21 0
4 1396-03-22 0
.. ... ...
258 1396-12-01 0
259 1396-12-02 0
260 1396-12-03 0
261 1396-12-04 0
262 1396-12-05 350000000
[263 rows x 2 columns]
来源:https://stackoverflow.com/questions/64731501/how-can-i-resample-pandas-dataframe-by-day-on-period-time