Execute dplyr operation only if column exists

前端 未结 6 1822
青春惊慌失措
青春惊慌失措 2021-02-07 09:18

Drawing on the discussion on conditional dplyr evaluation I would like conditionally execute a step in pipeline depending on whether the reference column exists in the passed da

相关标签:
6条回答
  • 2021-02-07 09:51

    This code does the trick and is pretty flexible. The ^ and $ are regex used to perform an exact match.

    mtcars %>% 
      set_names(names(.) %>% 
                  str_replace("am","1") %>% 
                  str_replace("^cyl$","2") %>% 
                  str_replace("Doesn't Exist","3")
                  )
    
    0 讨论(0)
  • 2021-02-07 09:52

    With across() in dplyr > 1.0.0 you can now use any_of when filtering. Compare original with all columns:

    mtcars %>% 
      filter(am == 1) %>% 
      filter(cyl == 4)
    

    With cyl removed, it throws an error:

    mtcars %>% 
      select(!cyl) %>% 
      filter(am == 1) %>% 
      filter(cyl == 4)
    

    Using any_of (note you have to write "cyl" and not cyl):

    mtcars %>% 
      select(!cyl) %>% 
      filter(am == 1) %>% 
      filter(across(any_of("cyl"), ~.x == 4))
    #N.B. this is equivalent to just filtering by `am == 1`.
    
    0 讨论(0)
  • 2021-02-07 09:55

    Edit: Unfortunately, this was too good to be true

    I might be a bit late to the party. But is

    mtcars %>% 
     filter(am == 1) %>%
     try(filter(absent_column== 4))
    

    a solution?

    0 讨论(0)
  • 2021-02-07 09:59

    I know I'm late to the party, but here's an answer somewhat more in line with what you were originally thinking:

    mtcars %>%
      filter(am == 1) %>%
      {
        if("cyl" %in% names(.)) filter(., cyl == 4) else .
      }
    

    Basically, you were missing the . in filter. Note this is because the pipeline doesn't add . to filter(expr) since it is in an expression surrounded by {}.

    0 讨论(0)
  • 2021-02-07 10:14

    Because of the way the scopes here work, you cannot access the dataframe from within your if statement. Fortunately, you don't need to.

    Try:

    mtcars %>%
      filter(am == 1) %>%
      filter({if("cyl" %in% names(.)) cyl else NULL} == 4)
    

    Here you can use the '.' object within the conditional so you can check if the column exists and, if it exists, you can return the column to the filter function.

    EDIT: as per docendo discimus' comment on the question, you can access the dataframe but not implicitly - i.e. you have to specifically reference it with .

    0 讨论(0)
  • 2021-02-07 10:14

    Avoid this trap:

    On a busy day, one might do like the following:

    library(dplyr)
    df <- data.frame(A = 1:3, B = letters[1:3], stringsAsFactors = F)
    > df %>% mutate( C = ifelse("D" %in% colnames(.), D, B)) 
    # Notice the values on "C" colum. No error thrown, but the logic and result is wrong
      A B C
    1 1 a a
    2 2 b a
    3 3 c a
    

    Why? Because "D" %in% colnames(.) returns only one value of TRUE or FALSE, and therefore ifelse operates only once. Then the value is broadcasted to the whole column!

    Correct way:

    > df %>% mutate( C = if("D" %in% colnames(.)) D else B)
      A B C
    1 1 a a
    2 2 b b
    3 3 c c
    
    0 讨论(0)
提交回复
热议问题