Calculate Difference between dates by group in R

☆樱花仙子☆ 提交于 2019-11-29 16:07:25

Collecting some of the comments...

Load dplyr

We need only the dplyr package for this problem. If we load other packages, e.g. plyr, it can cause conflicts if both packages have functions with the same name. Let's load only dplyr.

library(dplyr)

In the future, you may wish to load tidyverse instead -- it includes dplyr and other related packages, for graphics, etc.

Converting dates

Let's convert the DateVisit variable from character strings to something R can interpret as a date. Once we do this, it allows R to calculate differences in days by subtracting two dates from each other.

HS_Hatch <- HS_Hatch %>%
 mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))

The date format %m/%d/%Y is different from your original code. This date format needs to match how dates look in your data. DateVisit has dates as month/day/year, so we use %m/%d/%Y.

Also, you don't need to specify the dataset for DateVisit inside mutate, as in HS_Hatch$DateVisit, because it's already looking in HS_Hatch. The code HS_Hatch %>% ... says 'use HS_Hatch for the following steps'.

Calculating exposures

To calculate exposure, we need to find the first date, last date, and then the difference between the two, for each set of rows by ClutchID. We use summarize, which collapses the data to one row per ClutchID.

exposure <- HS_Hatch %>% 
    group_by(ClutchID) %>%
    summarize(first_visit = min(date_visit), 
              last_visit = max(date_visit), 
              exposure = last_visit - first_visit)

first_visit = min(date_visit) will find the minimum date_visit for each ClutchID separately, since we are using group_by(ClutchID).

exposure = last_visit - first_visit takes the newly-calculated first_visit and last_visit and finds the difference in days.

This creates the following result:

  ClutchID first_visit last_visit exposure
     <int>      <date>     <date>    <dbl>
1        1  2012-03-15 2012-04-03       19
2        2  2012-03-18 2012-04-04       17
3        3  2012-03-22 2012-04-04       13
4        4  2012-03-18 2012-04-04       17
5        5  2012-03-20 2012-04-05       16

If you want to keep all the original rows, you can use mutate in place of summarize.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!