Resample/fill gaps for blocks of datetime stamps

问题

Problem

I put a csv to a dataframe where some datetime gaps are present - sample frequency is 15 min, for each datetimestamps there is always a block of three values. In this example the block for the datetime 2017-12-11 23:15:00 is missing.

         ID           Datetime   Value
0        a 2017-12-11 23:00:00   20.0
1        b 2017-12-11 23:00:00   20.9
2        c 2017-12-11 23:00:00   21.0
3        a 2017-12-11 23:30:00   19.8
4        b 2017-12-11 23:30:00   20.8
5        c 2017-12-11 23:30:00   20.8

Desired result

What I want to do is to resample the Datetime and fill the gaps for Value with zeros:

         ID           Datetime   Value
0        a 2017-12-11 23:00:00   20.0
1        b 2017-12-11 23:00:00   20.9
2        c 2017-12-11 23:00:00   21.0
3        a 2017-12-11 23:15:00   0.0
4        b 2017-12-11 23:15:00   0.0
5        c 2017-12-11 23:15:00   0.0
6        a 2017-12-11 23:30:00   19.8
7        b 2017-12-11 23:30:00   20.8
8        c 2017-12-11 23:30:00   20.8

My Question

Is it possible to accomplish this with resample() or is a solution possible with a combination with groupby()?

import pandas as pd

df = pd.concat((pd.read_csv(file, parse_dates=[1], dayfirst=True, 
                    names=headers)for file in all_files))
df.set_index("Datetime").resample('15min').fillna(0).reset_index()

回答1:

You can use resample, and last / average if there are any multiple values for a single timestamp.

df.groupby('ID').resample('15min').last().fillna(0)

This will resample the dataframe, and take the last value for each of the sample periods (should be 1 or 0 values mostly), and for the occasions where there are no values, but an index (time) it will insert a 0 instead of a Not Applicable.

Note, this will only work if you have the appropriate Index type, I see you are parsing dates, calling df.dtypes will allow you to make certain that you have valid types for the Datetime column. I would recommend setting the index to 'Datetime' and leaving it there mostly if planning on doing many/any operations based on times. (i.e, do this before the above command!)

df.set_index('Datetime', inplace=True)

This will result in the new MultiIndex DataFrame below

Out[76]: 
                       ID  Value
ID Datetime                     
a  2018-02-26 23:00:00  a   20.0
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  a   19.8
b  2018-02-26 23:00:00  b   20.9
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  b   20.8
c  2018-02-26 23:00:00  c   21.0
   2018-02-26 23:15:00  0    0.0
   2018-02-26 23:30:00  c   20.8

And if you're only after the Value series, with a bit more moving and shaking we can end up with a slightly different dataframe with only a single index. This has the benefit of not having odd values in the ID column (see 0 above)

(df.groupby('ID')['Value']
 .resample('15min')
 .last()
 .fillna(0)
 .reset_index()
 .set_index('Datetime')
 .sort_index())

Out[107]: 
                    ID  Value
Datetime                     
2018-02-26 23:00:00  a   20.0
2018-02-26 23:00:00  b   20.9
2018-02-26 23:00:00  c   21.0
2018-02-26 23:15:00  a    0.0
2018-02-26 23:15:00  b    0.0
2018-02-26 23:15:00  c    0.0
2018-02-26 23:30:00  a   19.8
2018-02-26 23:30:00  b   20.8
2018-02-26 23:30:00  c   20.8

回答2:

Let's use some dataframe reshaping then resample and fillna, then convert back to original dataframe structure:

df_out = (df.set_index(['Datetime','ID'])
            .unstack()
            .resample('15T')
            .asfreq()
            .fillna(0)
            .stack()
            .reset_index())

Output:

             Datetime ID  Value
0 2017-12-11 23:00:00  a   20.0
1 2017-12-11 23:00:00  b   20.9
2 2017-12-11 23:00:00  c   21.0
3 2017-12-11 23:15:00  a    0.0
4 2017-12-11 23:15:00  b    0.0
5 2017-12-11 23:15:00  c    0.0
6 2017-12-11 23:30:00  a   19.8
7 2017-12-11 23:30:00  b   20.8
8 2017-12-11 23:30:00  c   20.8

来源：https://stackoverflow.com/questions/48989741/resample-fill-gaps-for-blocks-of-datetime-stamps

标签

python-3.x

pandas

datetime

resampling