Execute dplyr operation only if column exists

前端 未结 6 1833
青春惊慌失措
青春惊慌失措 2021-02-07 09:18

Drawing on the discussion on conditional dplyr evaluation I would like conditionally execute a step in pipeline depending on whether the reference column exists in the passed da

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-07 10:14

    Avoid this trap:

    On a busy day, one might do like the following:

    library(dplyr)
    df <- data.frame(A = 1:3, B = letters[1:3], stringsAsFactors = F)
    > df %>% mutate( C = ifelse("D" %in% colnames(.), D, B)) 
    # Notice the values on "C" colum. No error thrown, but the logic and result is wrong
      A B C
    1 1 a a
    2 2 b a
    3 3 c a
    

    Why? Because "D" %in% colnames(.) returns only one value of TRUE or FALSE, and therefore ifelse operates only once. Then the value is broadcasted to the whole column!

    Correct way:

    > df %>% mutate( C = if("D" %in% colnames(.)) D else B)
      A B C
    1 1 a a
    2 2 b b
    3 3 c c
    

提交回复
热议问题