I have a dataframe where one of the columns contains dates (some dates appear multiple times). I want to aggregate the dates by week. The best way I can think of this is to
With lubridate
you could try this:
library(lubridate)
dates <- seq.Date(as.Date("2016-04-04"), as.Date("2016-04-14"), by = 1)
floor_date(dates - 1, "weeks") + 1
floor_date
starts weeks on Sundays, so to avoid those being included in the next week you have to subtract one before rounding and then increase the value by one day.
cut()
from base R has two methods for objects of class Date
and POSIXt
which assume that weeks start on Monday by default (but may be changed to Sunday using start.on.monday = FALSE
).
dates <- c("2016-04-04", "2016-04-05", "2016-04-06", "2016-04-07", "2016-04-08",
"2016-04-09", "2016-04-10", "2016-04-11", "2016-04-12", "2016-04-13",
"2016-04-14")
result <- data.frame(
dates,
cut_Date = cut(as.Date(dates), "week"),
cut_POSIXt = cut(as.POSIXct(dates), "week"),
stringsAsFactors = FALSE)
result
# dates cut_Date cut_POSIXt
#1 2016-04-04 2016-04-04 2016-04-04
#2 2016-04-05 2016-04-04 2016-04-04
#3 2016-04-06 2016-04-04 2016-04-04
#4 2016-04-07 2016-04-04 2016-04-04
#5 2016-04-08 2016-04-04 2016-04-04
#6 2016-04-09 2016-04-04 2016-04-04
#7 2016-04-10 2016-04-04 2016-04-04
#8 2016-04-11 2016-04-11 2016-04-11
#9 2016-04-12 2016-04-11 2016-04-11
#10 2016-04-13 2016-04-11 2016-04-11
#11 2016-04-14 2016-04-11 2016-04-11
Note that cut()
returns factors which is perfect for aggregation as requested by the OP:
str(result)
#'data.frame': 11 obs. of 3 variables:
# $ dates : chr "2016-04-04" "2016-04-05" "2016-04-06" "2016-04-07" ...
# $ cut_Date : Factor w/ 2 levels "2016-04-04","2016-04-11": 1 1 1 1 1 1 1 2 2 2 ...
# $ cut_POSIXt: Factor w/ 2 levels "2016-04-04","2016-04-11": 1 1 1 1 1 1 1 2 2 2 ...
However, for plotting aggregated values with ggplot2
(and if there is a large number of weeks which might clutter the axis) it might be better to switch from a discrete time scale to a continuous time scale. Then it is necessary to coerce factors back to Date
or POSIXct
:
as.Date(as.character(result$cut_Date))
as.POSIXct(as.character(result$cut_Date))
With the week_start
parameter in the floor_date
function of the lubridate
package you have the option to specify the beginning of the week since lubridate version 1.7.0. This allows you to perform:
library(lubridate)
dates <- seq.Date(as.Date("2016-04-04"), as.Date("2016-04-14"), by = 1)
floor_date(dates, "weeks", week_start = 1)
I would post it as a comment to Sraffa's response but I don't have the reputation.