问题
Problem
I put a csv to a dataframe where some datetime gaps are present - sample frequency is 15 min, for each datetimestamps there is always a block of three values. In this example the block for the datetime 2017-12-11 23:15:00
is missing.
ID Datetime Value
0 a 2017-12-11 23:00:00 20.0
1 b 2017-12-11 23:00:00 20.9
2 c 2017-12-11 23:00:00 21.0
3 a 2017-12-11 23:30:00 19.8
4 b 2017-12-11 23:30:00 20.8
5 c 2017-12-11 23:30:00 20.8
Desired result
What I want to do is to resample the Datetime and fill the gaps for Value
with zeros:
ID Datetime Value
0 a 2017-12-11 23:00:00 20.0
1 b 2017-12-11 23:00:00 20.9
2 c 2017-12-11 23:00:00 21.0
3 a 2017-12-11 23:15:00 0.0
4 b 2017-12-11 23:15:00 0.0
5 c 2017-12-11 23:15:00 0.0
6 a 2017-12-11 23:30:00 19.8
7 b 2017-12-11 23:30:00 20.8
8 c 2017-12-11 23:30:00 20.8
My Question
Is it possible to accomplish this with resample()
or is a solution possible with a combination with groupby()
?
import pandas as pd
df = pd.concat((pd.read_csv(file, parse_dates=[1], dayfirst=True,
names=headers)for file in all_files))
df.set_index("Datetime").resample('15min').fillna(0).reset_index()
回答1:
You can use resample, and last / average if there are any multiple values for a single timestamp.
df.groupby('ID').resample('15min').last().fillna(0)
This will resample the dataframe, and take the last value for each of the sample periods (should be 1 or 0 values mostly), and for the occasions where there are no values, but an index (time) it will insert a 0 instead of a Not Applicable.
Note, this will only work if you have the appropriate Index type, I see you are parsing dates, calling df.dtypes will allow you to make certain that you have valid types for the Datetime column. I would recommend setting the index to 'Datetime' and leaving it there mostly if planning on doing many/any operations based on times. (i.e, do this before the above command!)
df.set_index('Datetime', inplace=True)
This will result in the new MultiIndex DataFrame below
Out[76]:
ID Value
ID Datetime
a 2018-02-26 23:00:00 a 20.0
2018-02-26 23:15:00 0 0.0
2018-02-26 23:30:00 a 19.8
b 2018-02-26 23:00:00 b 20.9
2018-02-26 23:15:00 0 0.0
2018-02-26 23:30:00 b 20.8
c 2018-02-26 23:00:00 c 21.0
2018-02-26 23:15:00 0 0.0
2018-02-26 23:30:00 c 20.8
And if you're only after the Value series, with a bit more moving and shaking we can end up with a slightly different dataframe with only a single index. This has the benefit of not having odd values in the ID column (see 0 above)
(df.groupby('ID')['Value']
.resample('15min')
.last()
.fillna(0)
.reset_index()
.set_index('Datetime')
.sort_index())
Out[107]:
ID Value
Datetime
2018-02-26 23:00:00 a 20.0
2018-02-26 23:00:00 b 20.9
2018-02-26 23:00:00 c 21.0
2018-02-26 23:15:00 a 0.0
2018-02-26 23:15:00 b 0.0
2018-02-26 23:15:00 c 0.0
2018-02-26 23:30:00 a 19.8
2018-02-26 23:30:00 b 20.8
2018-02-26 23:30:00 c 20.8
回答2:
Let's use some dataframe reshaping then resample
and fillna
, then convert back to original dataframe structure:
df_out = (df.set_index(['Datetime','ID'])
.unstack()
.resample('15T')
.asfreq()
.fillna(0)
.stack()
.reset_index())
Output:
Datetime ID Value
0 2017-12-11 23:00:00 a 20.0
1 2017-12-11 23:00:00 b 20.9
2 2017-12-11 23:00:00 c 21.0
3 2017-12-11 23:15:00 a 0.0
4 2017-12-11 23:15:00 b 0.0
5 2017-12-11 23:15:00 c 0.0
6 2017-12-11 23:30:00 a 19.8
7 2017-12-11 23:30:00 b 20.8
8 2017-12-11 23:30:00 c 20.8
来源:https://stackoverflow.com/questions/48989741/resample-fill-gaps-for-blocks-of-datetime-stamps