问题
I would like to calculate age based on birth date.
If I use lubridate, I would just run the following as in Efficient and accurate age calculation (in years, months, or weeks) in R given birth date and an arbitrary date
as.period(new_interval(start = birthdate, end = givendate))$year
However, when I tried to use mutate
in dplyr
to create the new variable, I ran into an error.
library(dplyr); library(lubridate)
birthdate <- ymd(c(NA, "1978-12-31", "1979-01-01", "1962-12-30"))
givendate <- ymd(c(NA, "2015-12-31", "2015-12-31", NA))
df <- data.frame(
birthdate = birthdate,
givendate = givendate)
The following works though it gives all the date and time values. i.e. year, month, day, hour, minute and second.
df<-df %>% mutate(age=as.period(interval(start = birthdate, end = givendate)))
# df
# birthdate givendate age
# 1 <NA> <NA> <NA>
# 2 1978-12-31 2015-12-31 37y 0m 0d 0H 0M 0S
# 3 1979-01-01 2015-12-31 36y 11m 30d 0H 0M 0S
# 4 1962-12-30 <NA> <NA>
The following does not work:
df<-df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))$year)
It gives an error:
Error in mutate_impl(.data, dots) : invalid subscript type 'closure'
I thought it might be because of the missing values. So, I tried:
df<-df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
mutate(age=if_else(!is.na(age),age$year,age))
It also gives an error:
Error in mutate_impl(.data, dots) : object 'age' not found
回答1:
We can use do
df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
do(data.frame(.[setdiff(names(.), "age")],
age = ifelse(!is.na(.$age), .$age$year, .$age)))
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA
As the as.period
comes with period
class, we may need S4 methods to extract it
df %>%
mutate(age=as.period(interval(start = birthdate, end = givendate))) %>%
.$age %>%
.@year %>%
mutate(df, age = .)
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA
回答2:
Within lubridate
,
Period
is an S4 class with a slot "year"- year is an S3 class object with a method to extract the year slot from a period object.
see https://github.com/hadley/lubridate/blob/master/R/accessors-year.r) an accessor function to extract the year component.
Therefore, the following will work
df %>% mutate(age = year(as.period(interval(start = birthdate, end = givendate))))
回答3:
We can use year
function from lubridate
to get the difference between two dates in years.
library(dplyr); library(lubridate)
df %>% mutate(age = year(givendate) - year(birthdate))
# birthdate givendate age
#1 <NA> <NA> NA
#2 1978-12-31 2015-12-31 37
#3 1979-01-01 2015-12-31 36
#4 1962-12-30 <NA> NA
来源:https://stackoverflow.com/questions/41714319/calculating-age-using-mutate-with-lubridate-functions