问题
I have a R dataframe which can be generated from the code below
DF <- data.frame("Person_id" = c(1,1,1,1,2,2,2,2,3,3), "Type" = c("IN","OUT","IN","ANC","IN","OUT","IN","ANC","EM","ANC"), "Name" = c("Nara","Nara","Nara","Nara","Dora","Dora","Dora","Dora","Sara","Sara"),"day_1" = c("21/1/2002","21/4/2002","21/6/2002","21/9/2002","28/1/2012","28/4/2012","28/6/2012","28/9/2012","30/06/2004","30/06/2005"),"day_2" = c("23/1/2002","21/4/2002","","","30/1/2012","28/4/2012","","28/9/2012","",""))
What I would like to do is create two new columns as admit_start_date
and admit_end_date
based on few conditions which are given below
Rule 1
admit_start_date = day_1
admit_end_date = day_2 (sometimes day_2 can be NA. So refer Rule 2 below)
Rule 2
if day_2 is (null or blank or na) and Type is (Out or ANC or EM) then
admit_end_date = day_1
else (if Type is IN)
admit_end_date = day_1 + 5 (days)
This is what I am trying but doesn't seem to help
transform_dates = function(DF){ # this function is to create 'date' columns
DF %>%
mutate(admit_start_date = day_1) %>%
mutate(admit_end_date = day_2) %>%
admit_end_date = if_else(((Type == 'Out' & admit_end_date.isna() ==True|Type == 'ANC' & admit_end_date.isna() ==True|Type == 'EM' & admit_end_date.isna() ==True),day_1,day_1 + 5)
)
}
As you can see, I am not sure how to check for NA
for a newly created column and replace those NAs
with day_1
or day_1 + 5(days)
based on Type column.
Can you please help?
I expect my output to be like as shown below
回答1:
We can use case_when
to specify each condition separately after converting "day"
columns to actual date objects.
library(dplyr)
DF %>%
mutate_at(vars(starts_with('day')), as.Date, "%d/%m/%Y") %>%
mutate(admit_start_date = day_1,
admit_end_date = case_when(
!is.na(day_2) ~day_2,
is.na(day_2) & Type %in% c('OUT', 'ANC', 'EM') ~ day_1,
Type == 'IN' ~ day_1 + 5))
# Person_id Type Name day_1 day_2 admit_start_date admit_end_date
#1 1 IN Nara 2002-01-21 2002-01-23 2002-01-21 2002-01-23
#2 1 OUT Nara 2002-04-21 2002-04-21 2002-04-21 2002-04-21
#3 1 IN Nara 2002-06-21 <NA> 2002-06-21 2002-06-26
#4 1 ANC Nara 2002-09-21 <NA> 2002-09-21 2002-09-21
#5 2 IN Dora 2012-01-28 2012-01-30 2012-01-28 2012-01-30
#6 2 OUT Dora 2012-04-28 2012-04-28 2012-04-28 2012-04-28
#7 2 IN Dora 2012-06-28 <NA> 2012-06-28 2012-07-03
#8 2 ANC Dora 2012-09-28 2012-09-28 2012-09-28 2012-09-28
#9 3 EM Sara 2004-06-30 <NA> 2004-06-30 2004-06-30
#10 3 ANC Sara 2005-06-30 <NA> 2005-06-30 2005-06-30
The dates in the dataframe are not of class "Date", (class(DF$day_1)
), using mutate_at
we change their class to "Date" so we can perform mathematical calculations on it. starts_with('day')
means that any column whose name starts with "day"
would be converted to "Date" class. We use mutate_at
when we want to apply the same function to multiple columns.
case_when
is an alternative to nested ifelse
statements. They execute in sequential order. So first condition is checked, if the condition is satisfied it doesn't check the remaining conditions. If the first condition is not satisfied, it checks for the second condition and so on. Hence, no else
is required here. If none of the conditions are satisfied it returns NA
. Check ?case_when
.
来源:https://stackoverflow.com/questions/60014949/mutate-based-on-two-conditions-in-r-dataframe