Here is my some Dataset which having Time,Temperature1,Temperature2
Timestamp. Temperature1. Temperature2
09/01/2016 00:00:08 53.
Assuming that the timestamps have been converted to datetime
, if you set the index to the timestamp column and then reindex with a date range then the missing values will show up:
In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df
Out[94]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
In [96]:
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))
Out[96]:
Temperature1 Temperature2
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
This assumes that the timestamps are regular, here we construct a date range using the timestamp first and last values with a frequency of 30 seconds:
In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')
Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
'2016-09-01 00:01:08', '2016-09-01 00:01:38',
'2016-09-01 00:02:08', '2016-09-01 00:02:38',
'2016-09-01 00:03:08', '2016-09-01 00:03:38',
'2016-09-01 00:04:08', '2016-09-01 00:04:38',
'2016-09-01 00:05:08', '2016-09-01 00:05:38'],
dtype='datetime64[ns]', freq='30S')
When you reindex
with this, any missing index labels become NaN
values
you can use resample('30S', base=8) method:
In [20]: x.resample('30S', base=8).mean()
Out[20]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
The solution above assumes that the Timestamp
is of datetime
dtype and that it has been set as index.
If Timestamp
is a regular column (not index), then starting from Pandas 0.19.0 we can resample on regular columns (it must be of datetime
dtype), using on='column_name'
parameter:
In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
if you need to find your base
value dynamically you can do it this way:
In [21]: x.index[0].second
Out[21]: 8
from docs:
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for
5min
frequency, base could range from0
through4
.Defaults to
0