Mutate based on two conditions in R dataframe

江枫思渺然 提交于 2020-02-24 11:43:11

问题


I have a R dataframe which can be generated from the code below

DF <- data.frame("Person_id" = c(1,1,1,1,2,2,2,2,3,3), "Type" = c("IN","OUT","IN","ANC","IN","OUT","IN","ANC","EM","ANC"), "Name" = c("Nara","Nara","Nara","Nara","Dora","Dora","Dora","Dora","Sara","Sara"),"day_1" = c("21/1/2002","21/4/2002","21/6/2002","21/9/2002","28/1/2012","28/4/2012","28/6/2012","28/9/2012","30/06/2004","30/06/2005"),"day_2" = c("23/1/2002","21/4/2002","","","30/1/2012","28/4/2012","","28/9/2012","",""))

What I would like to do is create two new columns as admit_start_date and admit_end_date based on few conditions which are given below

Rule 1

  admit_start_date = day_1
  admit_end_date   = day_2 (sometimes day_2 can be NA. So refer Rule 2 below)

Rule 2

   if day_2 is (null or blank or na) and Type is (Out or ANC or EM) then 
         admit_end_date = day_1 
   else (if Type is IN)
         admit_end_date = day_1 + 5 (days)

This is what I am trying but doesn't seem to help

    transform_dates = function(DF){  # this function is to create 'date' columns  
  DF %>% 
    mutate(admit_start_date = day_1) %>% 
    mutate(admit_end_date = day_2) %>%
    admit_end_date = if_else(((Type == 'Out' & admit_end_date.isna() ==True|Type == 'ANC' & admit_end_date.isna() ==True|Type == 'EM' & admit_end_date.isna() ==True),day_1,day_1 + 5)
    )
}  

As you can see, I am not sure how to check for NA for a newly created column and replace those NAs with day_1 or day_1 + 5(days) based on Type column.

Can you please help?

I expect my output to be like as shown below


回答1:


We can use case_when to specify each condition separately after converting "day" columns to actual date objects.

library(dplyr)

DF %>%
  mutate_at(vars(starts_with('day')), as.Date, "%d/%m/%Y") %>%
  mutate(admit_start_date = day_1, 
         admit_end_date = case_when(
         !is.na(day_2) ~day_2,
         is.na(day_2) & Type %in% c('OUT', 'ANC', 'EM') ~ day_1, 
         Type == 'IN' ~ day_1 + 5))


#  Person_id Type Name      day_1      day_2 admit_start_date admit_end_date
#1          1   IN Nara 2002-01-21 2002-01-23       2002-01-21     2002-01-23
#2          1  OUT Nara 2002-04-21 2002-04-21       2002-04-21     2002-04-21
#3          1   IN Nara 2002-06-21       <NA>       2002-06-21     2002-06-26
#4          1  ANC Nara 2002-09-21       <NA>       2002-09-21     2002-09-21
#5          2   IN Dora 2012-01-28 2012-01-30       2012-01-28     2012-01-30
#6          2  OUT Dora 2012-04-28 2012-04-28       2012-04-28     2012-04-28
#7          2   IN Dora 2012-06-28       <NA>       2012-06-28     2012-07-03
#8          2  ANC Dora 2012-09-28 2012-09-28       2012-09-28     2012-09-28
#9          3   EM Sara 2004-06-30       <NA>       2004-06-30     2004-06-30
#10         3  ANC Sara 2005-06-30       <NA>       2005-06-30     2005-06-30

The dates in the dataframe are not of class "Date", (class(DF$day_1)), using mutate_at we change their class to "Date" so we can perform mathematical calculations on it. starts_with('day') means that any column whose name starts with "day" would be converted to "Date" class. We use mutate_at when we want to apply the same function to multiple columns.

case_when is an alternative to nested ifelse statements. They execute in sequential order. So first condition is checked, if the condition is satisfied it doesn't check the remaining conditions. If the first condition is not satisfied, it checks for the second condition and so on. Hence, no else is required here. If none of the conditions are satisfied it returns NA. Check ?case_when.



来源:https://stackoverflow.com/questions/60014949/mutate-based-on-two-conditions-in-r-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!