Recode multiple columns using dplyr

后端 未结 5 828
一向
一向 2020-12-11 19:19

I had a dataframe where I recoded several columns so that 999 was set to NA

dfB <-dfA %>%
  mutate(adhere = if_else(adhere==999, as.numeric(NA), adhere         


        
相关标签:
5条回答
  • 2020-12-11 19:47

    I think it is related the column type. I added mutate_if to convert all integer columns to numeric, and then set the recode value to be NA_real_. It seems working.

    library(dplyr)
    
    y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))
    
    z <- y %>%
      mutate_if(is.integer, as.numeric) %>%
      mutate_at(vars(y1:y2), funs(recode(.,`999` = NA_real_)))
    z
    #   y1 y2    y3
    # 1  1  1  TRUE
    # 2  2  2  TRUE
    # 3 NA NA FALSE
    # 4  3  3 FALSE
    # 5  4  4  TRUE
    
    0 讨论(0)
  • 2020-12-11 19:54

    Now that funs has been depreciated in dplyr, here's the new way to go:

    z <- y %>%
      mutate_if(is.integer, as.numeric) %>%
      mutate_at(vars(y1:y2), list(~recode(.,`999` = NA_real_)))
    

    Replace funs with list and insert a ~ before recode.

    0 讨论(0)
  • 2020-12-11 19:57

    If you are trying to recode something to an NA the na_if() function should also work.

    0 讨论(0)
  • 2020-12-11 20:03

    I'm having trouble understanding exactly what you want to accomplish, so let me know if this isn't quite it.


    library(dplyr)
    
    y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))
    
    y
    
    #>    y1  y2    y3
    #> 1   1   1  TRUE
    #> 2   2   2  TRUE
    #> 3 999 999 FALSE
    #> 4   3   3 FALSE
    #> 5   4   4  TRUE
    
    z <- y %>%
      mutate_at(vars(y1:y2), ~ifelse(. == 999, NA, .))
    
    z
    
    #>   y1 y2    y3
    #> 1  1  1  TRUE
    #> 2  2  2  TRUE
    #> 3 NA NA FALSE
    #> 4  3  3 FALSE
    #> 5  4  4  TRUE
    
    0 讨论(0)
  • 2020-12-11 20:12

    Currently, based on dplyr documentation:

    across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

    So, using mutate and across instead is now recommended.

    Like Chris LeBoa said, if you only want to convert an annoying value to NA, the function na_if() is probably the best choice:

    y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))
    
    y
       y1  y2    y3
    1   1   1  TRUE
    2   2   2  TRUE
    3 999 999 FALSE
    4   3   3 FALSE
    5   4   4  TRUE
     
    z <- y %>%
        mutate(across(
            y1:y2,
            ~na_if(., 999)
        ))
    
    z
      y1 y2    y3
    1  1  1  TRUE
    2  2  2  TRUE
    3 NA NA FALSE
    4  3  3 FALSE
    5  4  4  TRUE
    

    Similarly, if you really want to recode values in multiple columns, you can follow the example from bcarothers:

    df1 <- tibble(Q7_1=1:5,
                  Q7_1_TEXT=c("let's","see","grogu","this","week"),
                  Q8_1=6:10,
                  Q8_1_TEXT=rep("grogu",5),
                  Q8_2=11:15,
                  Q8_2_TEXT=c("grogu","is","the","absolute","best"))
    
    df2 <- df1 %>%
        mutate(across(
            starts_with("Q8") & ends_with("TEXT"),
            ~recode(., "grogu"="mando")
        ))
    
    0 讨论(0)
提交回复
热议问题