Using if else on a dataframe across multiple columns

后端 未结 6 1345
情书的邮戳
情书的邮戳 2021-01-14 21:43

I have a large dataset of samples with descriptors of whether the sample is viable - it looks (kind of) like this, where \'desc\' is the description column and \'blank\' ind

相关标签:
6条回答
  • 2021-01-14 22:17

    Here's another dplyr solution with a small custom function and mutate_each().

    library(dplyr)
    
    f <- function(x) if_else(dat$desc == "blank", NA_real_, x)
    dat %>% 
      mutate_each(funs(f), -desc)
    #>      desc        x        y        z
    #> 1   blank       NA       NA       NA
    #> 2   blank       NA       NA       NA
    #> 3  sample 3.624941 6.430955 5.486632
    #> 4  sample 3.236359 4.935453 4.319202
    #> 5   blank       NA       NA       NA
    #> 6   blank       NA       NA       NA
    #> 7   blank       NA       NA       NA
    #> 8  sample 5.058725 6.751650 4.750529
    #> 9  sample 5.837206 4.323562 4.914780
    #> 10  blank       NA       NA       NA
    
    0 讨论(0)
  • 2021-01-14 22:19

    Using your first initial approach with loops I figured out this:

        for(i in 1:nrow(dat)){
      if(dat[i, 1] =="blank"){
        dat[i, 2:4] <- NA
      } 
      else {
        dat[i,length(dat)] <- dat[i, length(dat)]
      }
    }
    

    I tested it with your data and worked. Hope this is useful for everyone dealing with loops in rows and columns with conditions.

    0 讨论(0)
  • 2021-01-14 22:27

    Here is an option using set from data.table. It should be faster as the overhead of [.data.table is avoided. We convert the 'data.frame' to 'data.table' (setDT(df1)), loop through the column names of 'df1' (excluding the 'desc' column'), assign the elements to "NA" where the logical condition is 'i' is met.

    library(data.table)
    setDT(df1)
    for(j in names(df1)[-1]){
       set(df1, i= which(df1[["desc"]]=="blank"), j= j, value= NA)
    }
    df1
    #      desc        x        y        z
    # 1:  blank       NA       NA       NA
    # 2:  blank       NA       NA       NA
    # 3: sample 4.322014 4.798248 4.995959
    # 4: sample 3.997565 5.975604 7.160871
    # 5:  blank       NA       NA       NA
    # 6:  blank       NA       NA       NA
    # 7:  blank       NA       NA       NA
    # 8: sample 4.382937 5.926217 5.203737
    # 9: sample 4.976908 3.079191 4.614121
    #10:  blank       NA       NA       NA
    

    Or another option (based on @dww's comment)

    setDT(df1, key = "desc")["blank", names(df1)[-1] := NA][]
    
    0 讨论(0)
  • 2021-01-14 22:32

    This should work. Though honestly, if the data is unusable, why not delete the rows altogether?

    library(dplyr)
    
    blanks = 
      dat %>%
      filter(desc == "blank") %>%
      select(desc)
    
    dat %>%
      filter(desc == "sample") %>%
      bind_rows(blanks)
    
    0 讨论(0)
  • 2021-01-14 22:34

    For your example dataset this will work;

    Option 1, name the columns to change:

    dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA
    

    In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;

    Option 2, select columns using a range:

    dat[which(dat$desc == "blank"), 2:40] <- NA
    

    Option 3, exclude the 1st column:

    dat[which(dat$desc == "blank"), -1] <- NA
    

    Option 4, exclude a named column:

    dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA
    

    As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.

    0 讨论(0)
  • 2021-01-14 22:36

    You can use dplyr and a custom function to mutate values on certain conditions.

    `

    library(dplyr)
    mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
            condition <- eval(substitute(condition), .data, envir)
            .data[condition, ] <- .data[condition, ] %>% mutate(...)
            .data
    }
    data <- data %>% 
    mutate_cond( desc == "blank", x = NA, y = NA, z = NA)
    

    `

    0 讨论(0)
提交回复
热议问题