问题
I have a DataFrame f
with data at a 10 mins time step like so:
DateTime id value name
2015-01-01 00:00:00 40497 0 HY
2015-01-01 00:00:00 51395 589 HY
2015-01-01 00:10:00 51395 583 HY
2015-01-01 00:10:00 40497 0 HY
2015-01-01 00:20:00 51395 586 HY
2015-01-01 00:20:00 40497 0 HY
2015-01-01 00:30:00 40497 0 HY
2015-01-01 00:30:00 51395 586 HY
2015-01-01 00:40:00 40497 0 HY
The columns id and name are not relevant to what I want to do. The type of the DataFrame
is as follows:
'data.frame': 9510 obs. of 4 variables:
$ DateTime : POSIXct, format: "2019-10-27 00:00:00" "2019-10-27 00:10:00" "2019-10-27 00:20:00" ...
$ id : int 40497 40497 40497 40497 40497 40497 40497 40497 40497 40497 ...
$ value : int 1445 1444 1433 1431 1430 1431 1427 1411 1411 1410 ...
$ name: chr "HY" "HY" "HY" "HY" ...
I want to sum the values column by hour the data of the year 2019, past and future years are not important to me. This is at a first glance not that hard and there are a lot of answers to this question. One would do the following:
f <- f %>%
mutate(Year = year(DateTime)) %>%
filter(Year == 2019) %>%
mutate(day = floor_date(DateTime, 'day'), h = hour(DateTime)) %>%
group_by(day, h) %>%
mutate(sum_col = sum(value)) %>%
distinct(Year, .keep_all = T) %>%
ungroup()
The issue is that I have to consider daylight saving time, more specifically 27/10/2019 02:00:00. In my results DataFrame I need to have two rows for this value one that is the usual one and the other that is for Daylight Saving Time. The data already has "double values" for each of the 10 mins between 02:00 and 03:00" and it looks like this, but of course ith mutiple ids:
DateTime id value name
2019-10-27 02:00:00 40497 1403 HY
2019-10-27 02:10:00 40497 1396 HY
2019-10-27 02:20:00 40497 1395 HY
2019-10-27 02:30:00 40497 1396 HY
2019-10-27 02:40:00 40497 1380 HY
2019-10-27 02:50:00 40497 1374 HY
2019-10-27 02:00:00 40497 1373 HY
2019-10-27 02:10:00 40497 1374 HY
2019-10-27 02:20:00 40497 1373 HY
2019-10-27 02:30:00 40497 1373 HY
2019-10-27 02:40:00 40497 1373 HY
2019-10-27 02:50:00 40497 1373 HY
2019-10-27 03:00:00 40497 1367 HY
My question is how could I group by hour, regardeless of name and id and sum the values column and have 2 rows of 2019-10-27 02:00:00
, the first for the "real one" and the other for daylight savings.
来源:https://stackoverflow.com/questions/61551392/r-dplyr-grouping-with-daylight-saving-time