Extract/subset minute values from each hour

问题

My data frame contains date values in the format YYYY-MM-DD HH-MM-SS across 125000+ rows, broken down by the minute (each row represents a single minute).

1 2018-01-01 00:04:00
2 2018-01-01 00:05:00
3 2018-01-01 00:06:00
4 2018-01-01 00:07:00
5 2018-01-01 00:08:00
6 2018-01-01 00:09:00
...
124998 2018-03-29 05:07:00
124999 2018-03-29 05:08:00
125000 2018-03-29 05:09:00

I want to subset the data by extracting all of the minute values within any given hour and saving the results into individual data frames.

I have used subset() combined with grepl() to no avail. I have tried setting start = and stop = parameters but also to no avail.

What I want to do is for every HH value, I want to extract all rows with corresponding HH values and then create a new data frame for each respective HH value.

For example, I would like to have a data frame that corresponds to every minute's values (the full hour's worth of data), resulting in data frames such as:

2018-01-01 00:00:00 (contains data from 2018-01-01 00:00:00 to 2018-01-01 00:59:00 (inclusive))
2018-01-01 01:00:00 (contains data from 2018-01-01 01:00:00 to 2018-01-01 01:59:00 (inclusive))

and so on.

Is there a quick way to achieve this or is it a more laborious task?

Note: I am aware that my desired result will produce a lot of data frames, and that is fine for my particular project as I will only be working on a single one-hour block at any one time.

回答1:

This will produce a list of data frames grouped by each hour, assuming your data frame is called data and your first column is V1

split(data, format(data$V1, "%Y-%m-%d %H"))

回答2:

I have come up with a solution which extracts every minute (MM) value/row from the main data frame:

df <- buckets[grepl("00:\\d+:00$", buckets$time), ]

To separate it for each hour, I will simply change the first 00 depending on which hour I want to focus on and I can then perform a similar function to extract each individual date value.

回答3:

If you want to access each individual date value, lubridate has default functions for that.

library(lubridate)
data %>% mutate(year = year(x), month = month(x), day = day(x), hour = hour(x))

So you can get the same splits (but in a more cumbersome manner) by doing:

data %>% mutate(year = year(x), month = month(x), day = day(x), hour = hour(x))  %>%
  group_by(year, month, day, hour) %>% 
  split(list(.$year, .$month, .$day, .$hour))

The dummy data

x <- seq(as.POSIXct("2018-01-01 00:00:00"), as.POSIXct("2018-01-04 59:59:59"), length.out = 1000)
data <- data.frame(x)

来源：https://stackoverflow.com/questions/49669327/extract-subset-minute-values-from-each-hour

标签

subset

grepl