I have a data frame which has two columns of dates in the format yyyy/mm/dd. I am trying to calculate the number of days between these two dates for each observation within
In Ronald's example, if the date formats are different (as displayed below) then modify the format
parameter
survey <- data.frame(date=c("2012-07-26","2012-07-25"),tx_start=c("2012-01-01","2012-01-01"))
survey$date_diff <- as.Date(as.character(survey$date), format="%Y-%m-%d")-
as.Date(as.character(survey$tx_start), format="%Y-%m-%d")
survey:
date tx_start date_diff
1 2012-07-26 2012-01-01 207 days
2 2012-07-25 2012-01-01 206 days
You need to use the as.Date formats correctly.
Eg.
x = '2012/07/25'
xd = as.Date(x,'%Y/%m/%d')
xd # Prints "2012-07-25"
R date formats are similary to *nix ones.
Doing a typeof(xd) shows it as a double ie. days since 1970.
Without your seeing your data (you can use the output of dput(head(survey))
to show us) this is a shot in the dark:
survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))
survey$date_diff <- as.Date(as.character(survey$date), format="%Y/%m/%d")-
as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
survey
date tx_start date_diff
1 2012/07/26 2012/01/01 207 days
2 2012/07/25 2012/01/01 206 days
You could find the difference between dates in columns in a data frame by using the function difftime
as follows:
df$diff_in_days<- difftime(df$datevar1 ,df$datevar2 , units = c("days"))
Following Ronald Example I would like to add that it should be considered if the origin and end dates must be included or not in the days count between two dates. I faced the same problem and ended up using a third option with apply. It could be memory inefficient but helps to understand the problem:
survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))
survey$diff_1 <- as.numeric(
as.Date(as.character(survey$date), format="%Y/%m/%d")-
as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
)
survey$diff_2<- as.numeric(
difftime(survey$date ,survey$tx_start , units = c("days"))
)
survey$diff_3 <- apply(X = survey[,c("date", "tx_start")],
MARGIN = 1,
FUN = function(x)
length(
seq.Date(
from = as.Date(x[2]),
to = as.Date(x[1]),
by = "day")
)
)
This gives the following date differences:
date tx_start diff_1 diff_2 diff_3
1 2012/07/26 2012/01/01 207 206.9583 208
2 2012/07/25 2012/01/01 206 205.9583 207