Nested ifelse statement

前端 未结 9 1209
逝去的感伤
逝去的感伤 2020-11-22 04:02

I\'m still learning how to translate a SAS code into R and I get warnings. I need to understand where I\'m making mistakes. What I want to do is create a variable which summ

相关标签:
9条回答
  • 2020-11-22 04:21

    If you are using any spreadsheet application there is a basic function if() with syntax:

    if(<condition>, <yes>, <no>)
    

    Syntax is exactly the same for ifelse() in R:

    ifelse(<condition>, <yes>, <no>)
    

    The only difference to if() in spreadsheet application is that R ifelse() is vectorized (takes vectors as input and return vector on output). Consider the following comparison of formulas in spreadsheet application and in R for an example where we would like to compare if a > b and return 1 if yes and 0 if not.

    In spreadsheet:

      A  B C
    1 3  1 =if(A1 > B1, 1, 0)
    2 2  2 =if(A2 > B2, 1, 0)
    3 1  3 =if(A3 > B3, 1, 0)
    

    In R:

    > a <- 3:1; b <- 1:3
    > ifelse(a > b, 1, 0)
    [1] 1 0 0
    

    ifelse() can be nested in many ways:

    ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>))
    
    ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>)
    
    ifelse(<condition>, 
           ifelse(<condition>, <yes>, <no>), 
           ifelse(<condition>, <yes>, <no>)
          )
    
    ifelse(<condition>, <yes>, 
           ifelse(<condition>, <yes>, 
                  ifelse(<condition>, <yes>, <no>)
                 )
           )
    

    To calculate column idnat2 you can:

    df <- read.table(header=TRUE, text="
    idnat idbp idnat2
    french mainland mainland
    french colony overseas
    french overseas overseas
    foreign foreign foreign"
    )
    
    with(df, 
         ifelse(idnat=="french",
           ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign")
         )
    

    R Documentation

    What is the condition has length > 1 and only the first element will be used? Let's see:

    > # What is first condition really testing?
    > with(df, idnat=="french")
    [1]  TRUE  TRUE  TRUE FALSE
    > # This is result of vectorized function - equality of all elements in idnat and 
    > # string "french" is tested.
    > # Vector of logical values is returned (has the same length as idnat)
    > df$idnat2 <- with(df,
    +   if(idnat=="french"){
    +   idnat2 <- "xxx"
    +   }
    +   )
    Warning message:
    In if (idnat == "french") { :
      the condition has length > 1 and only the first element will be used
    > # Note that the first element of comparison is TRUE and that's whay we get:
    > df
        idnat     idbp idnat2
    1  french mainland    xxx
    2  french   colony    xxx
    3  french overseas    xxx
    4 foreign  foreign    xxx
    > # There is really logic in it, you have to get used to it
    

    Can I still use if()? Yes, you can, but the syntax is not so cool :)

    test <- function(x) {
      if(x=="french") {
        "french"
      } else{
        "not really french"
      }
    }
    
    apply(array(df[["idnat"]]),MARGIN=1, FUN=test)
    

    If you are familiar with SQL, you can also use CASE statement in sqldf package.

    0 讨论(0)
  • 2020-11-22 04:21

    Try something like the following:

    # some sample data
    idnat <- sample(c("french","foreigner"),100,TRUE)
    idbp <- rep(NA,100)
    idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE)
    
    # recoding
    out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland",
                  ifelse(idbp %in% c("overseas","colony"),"overseas",
                         "foreigner"))
    cbind(idnat,idbp,out) # check result
    

    Your confusion comes from how SAS and R handle if-else constructions. In R, if and else are not vectorized, meaning they check whether a single condition is true (i.e., if("french"=="french") works) and cannot handle multiple logicals (i.e., if(c("french","foreigner")=="french") doesn't work) and R gives you the warning you're receiving.

    By contrast, ifelse is vectorized, so it can take your vectors (aka input variables) and test the logical condition on each of their elements, like you're used to in SAS. An alternative way to wrap your head around this would be to build a loop using if and else statements (as you've started to do here) but the vectorized ifelse approach will be more efficient and involve generally less code.

    0 讨论(0)
  • 2020-11-22 04:22

    If the data set contains many rows it might be more efficient to join with a lookup table using data.table instead of nested ifelse().

    Provided the lookup table below

    lookup
    
         idnat     idbp   idnat2
    1:  french mainland mainland
    2:  french   colony overseas
    3:  french overseas overseas
    4: foreign  foreign  foreign
    

    and a sample data set

    library(data.table)
    n_row <- 10L
    set.seed(1L)
    DT <- data.table(idnat = "french",
                     idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE))
    DT[idbp == "foreign", idnat := "foreign"][]
    
          idnat     idbp
     1:  french   colony
     2:  french   colony
     3:  french overseas
     4: foreign  foreign
     5:  french mainland
     6: foreign  foreign
     7: foreign  foreign
     8:  french overseas
     9:  french overseas
    10:  french mainland
    

    then we can do an update while joining:

    DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]
    
          idnat     idbp   idnat2
     1:  french   colony overseas
     2:  french   colony overseas
     3:  french overseas overseas
     4: foreign  foreign  foreign
     5:  french mainland mainland
     6: foreign  foreign  foreign
     7: foreign  foreign  foreign
     8:  french overseas overseas
     9:  french overseas overseas
    10:  french mainland mainland
    
    0 讨论(0)
  • 2020-11-22 04:22

    With data.table, the solutions is:

    DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", 
            ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]
    

    The ifelse is vectorized. The if-else is not. Here, DT is:

        idnat     idbp
    1  french mainland
    2  french   colony
    3  french overseas
    4 foreign  foreign
    

    This gives:

       idnat     idbp   idnat2
    1:  french mainland mainland
    2:  french   colony overseas
    3:  french overseas overseas
    4: foreign  foreign  foreign
    
    0 讨论(0)
  • 2020-11-22 04:29

    The explanation with the examples was key to helping mine, but the issue that i came was when I copied it didn't work so I had to mess with it in several ways to get it to work right. (I'm super new at R, and had some issues with the third ifelse due to lack of knowledge).

    so for those who are super new to R running into issues...

       ifelse(x < -2,"pretty negative", ifelse(x < 1,"close to zero", ifelse(x < 3,"in [1, 3)","large")##all one line
         )#normal tab
    )
    

    (i used this in a function so it "ifelse..." was tabbed over one, but the last ")" was completely to the left)

    0 讨论(0)
  • 2020-11-22 04:33

    Using the SQL CASE statement with the dplyr and sqldf packages:

    Data

    df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", 
    "french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 
    2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", 
    "idbp"), class = "data.frame", row.names = c(NA, -4L))
    

    sqldf

    library(sqldf)
    sqldf("SELECT idnat, idbp,
            CASE 
              WHEN idbp IN ('colony', 'overseas') THEN 'overseas' 
              ELSE idbp 
            END AS idnat2
           FROM df")
    

    dplyr

    library(dplyr)
    df %>% 
    mutate(idnat2 = case_when(.$idbp == 'mainland' ~ "mainland", 
                              .$idbp %in% c("colony", "overseas") ~ "overseas", 
                             TRUE ~ "foreign"))
    

    Output

        idnat     idbp   idnat2
    1  french mainland mainland
    2  french   colony overseas
    3  french overseas overseas
    4 foreign  foreign  foreign
    
    0 讨论(0)
提交回复
热议问题