Converting factors to binary in R

前端 未结 4 1477
眼角桃花
眼角桃花 2020-12-01 19:08

I am trying to convert a factor variable into binary / boolean (0 or 1).

Sample data:

df  <-data.frame(a = c(1,2,3), b = c(1,1,2), c = c(\"Rose\",         


        
相关标签:
4条回答
  • 2020-12-01 19:37

    You can do this with reshaping:

    library(dplyr)
    library(tidyr)
    
    df %>%
      mutate(value = 1,
             c = paste0("Is", c)) %>%
      spread(c, value, fill = 0)
    
    0 讨论(0)
  • 2020-12-01 19:41

    In base R, you could use sapply() on the levels, using == to check for presence and as.integer() to coerce it to binary.

    cbind(df[1:2], sapply(levels(df$c), function(x) as.integer(x == df$c)), df[4])
    #   a b Pink Red Rose d
    # 1 1 1    0   0    1 2
    # 2 2 1    1   0    0 3
    # 3 3 2    0   1    0 4
    

    But since you have a million rows, you may want to go with data.table.

    library(data.table)
    setDT(df)[, c(levels(df$c), "c") := 
        c(lapply(levels(c), function(x) as.integer(x == c)), .(NULL))]
    

    which gives

    df
    #    a b d Pink Red Rose
    # 1: 1 1 2    0   0    1
    # 2: 2 1 3    1   0    0
    # 3: 3 2 4    0   1    0
    

    And you can reset the column order if you need to with setcolorder(df, c(1, 2, 4:6, 3)).

    0 讨论(0)
  • 2020-12-01 19:41
    dummy <- function(df) {  
      NUM <- function(dataframe)dataframe[,sapply(dataframe,is.numeric)]
      FAC <- function(dataframe)dataframe[,sapply(dataframe,is.factor)]
    
      require(ade4)
      if (is.null(ncol(NUM(df)))) {
          DF <- data.frame(NUM(df), acm.disjonctif(FAC(df)))
          names(DF)[1] <- colnames(df)[which(sapply(df, is.numeric))]
      } else {
          DF <- data.frame(NUM(df), acm.disjonctif(FAC(df)))
      }
      return(DF)
    } 
    
    0 讨论(0)
  • 2020-12-01 19:43

    Using dplyr and putting it on pipe. @bramtayl's answer was cleaner but I couldn't find a way to use custom variable name. This is less clean but more DRY

    expand_factor <- function(df,variable){
        variable = as.name(variable)
        paste0('~ ',variable,' -1',collapse = '') %>% 
            as.formula ->formulae
    
        current.na.action <- options('na.action')
        options(na.action='na.pass')
        expanded<-model.matrix(data=df,object = formulae)
        options(na.action=current.na.action)
    
        colnames(expanded) <-gsub(replacement = 'is_',x = colnames(expanded),pattern=variable) 
    
        expanded %>% 
            tbl_df %>% 
            mutate_each(funs(as.integer)) ->expanded
    
        return(bind_cols(df,expanded))
    }
    
    library(dplyr)
    df  <-data_frame(x = iris$Species,y = iris$Petal.Width)
    df <- rbind(data_frame(x=NA,y = NA),df)
    
    df %>% 
        expand_factor('x')
    
    > df %>% 
    +   expand_factor('x')
    # A tibble: 151 <U+00D7> 5
            x     y is_setosa is_versicolor is_virginica
        <chr> <dbl>     <int>         <int>        <int>
    1    <NA>    NA        NA            NA           NA
    2  setosa   0.2         1             0            0
    3  setosa   0.2         1             0            0
    4  setosa   0.2         1             0            0
    5  setosa   0.2         1             0            0
    6  setosa   0.2         1             0            0
    7  setosa   0.4         1             0            0
    8  setosa   0.3         1             0            0
    9  setosa   0.2         1             0            0
    10 setosa   0.2         1             0            0
    # ... with 141 more rows
    
    0 讨论(0)
提交回复
热议问题