问题
I'm trying to upsample weekly data to daily data, however, I'm having difficulty upsampling the last edge. How can I go about this?
import pandas as pd
import datetime
df = pd.DataFrame({'wk start': ['2018-08-12', '2018-08-12', '2018-08-19'],
'car': [ 'tesla model 3', 'tesla model x', 'tesla model 3'],
'sales':[38000,98000, 40000]})
df['wk start'] = df['wk start'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df.set_index('wk start').groupby('car').resample('D').pad()
This returns:
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
My desired output is:
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
2018-08-20 tesla model 3 40000
2018-08-21 tesla model 3 40000
2018-08-22 tesla model 3 40000
2018-08-23 tesla model 3 40000
2018-08-24 tesla model 3 40000
2018-08-25 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
2018-08-13 tesla model x 98000
2018-08-14 tesla model x 98000
2018-08-15 tesla model x 98000
2018-08-16 tesla model x 98000
2018-08-17 tesla model x 98000
2018-08-18 tesla model x 98000
I looked at this, but they're using periods and I'm looking at datetimes. Thanks in advance!
回答1:
Yes, you are right, last edge data are excluded. Solution is add them to input DataFrame
- my solution creates a helper Dataframe
using drop_duplicates, adds 6
days and concat's to original df
before using your solution:
df1 = df.sort_values('wk start').drop_duplicates('car', keep='last').copy()
df1['wk start'] = df1['wk start'] + pd.Timedelta(6, unit='d')
df = pd.concat([df, df1], ignore_index=True)
df = df.set_index('wk start').groupby('car').resample('D').pad()
print (df)
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
2018-08-20 tesla model 3 40000
2018-08-21 tesla model 3 40000
2018-08-22 tesla model 3 40000
2018-08-23 tesla model 3 40000
2018-08-24 tesla model 3 40000
2018-08-25 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
2018-08-13 tesla model x 98000
2018-08-14 tesla model x 98000
2018-08-15 tesla model x 98000
2018-08-16 tesla model x 98000
2018-08-17 tesla model x 98000
2018-08-18 tesla model x 98000
回答2:
Assign a column for the end of each of your weeks and stack
, before using your previous groupby
attempt:
(df.assign(end=df['wk start'].add(pd.DateOffset(6))).set_index(
['car', 'sales']).stack()
.rename('wk start').reset_index([0, 1])
.set_index('wk start').groupby('car')
.resample('D').pad()
)
Output:
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
2018-08-20 tesla model 3 40000
2018-08-21 tesla model 3 40000
2018-08-22 tesla model 3 40000
2018-08-23 tesla model 3 40000
2018-08-24 tesla model 3 40000
2018-08-25 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
2018-08-13 tesla model x 98000
2018-08-14 tesla model x 98000
2018-08-15 tesla model x 98000
2018-08-16 tesla model x 98000
2018-08-17 tesla model x 98000
2018-08-18 tesla model x 98000
回答3:
you could also do:
(pd.melt(df.assign(w = df['wk start']+pd.DateOffset(6)),df.columns[1:],value_name =
"wk start").drop('variable',1).set_index('wk start').groupby('car').resample('D').pad())
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
2018-08-20 tesla model 3 40000
2018-08-21 tesla model 3 40000
2018-08-22 tesla model 3 40000
2018-08-23 tesla model 3 40000
2018-08-24 tesla model 3 40000
2018-08-25 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
2018-08-13 tesla model x 98000
2018-08-14 tesla model x 98000
2018-08-15 tesla model x 98000
2018-08-16 tesla model x 98000
2018-08-17 tesla model x 98000
2018-08-18 tesla model x 98000
来源:https://stackoverflow.com/questions/51790793/pandas-resample-upsample-last-date-edge-of-data