Add extra level to factors in dataframe

后端 未结 5 1242
时光取名叫无心
时光取名叫无心 2020-11-30 04:09

I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to \"No Answer\", but levels of the facto

相关标签:
5条回答
  • 2020-11-30 04:32

    I have a very simple answer that may not directly address your specific scenario, but is a simple way to do this generally

    levels(df$column) <- c(levels(df$column), newFactorLevel)
    
    0 讨论(0)
  • 2020-11-30 04:40

    You could define a function that adds the levels to a factor, but just returns anything else:

    addNoAnswer <- function(x){
      if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))
      return(x)
    }
    

    Then you just lapply this function to your columns

    df <- as.data.frame(lapply(df, addNoAnswer))
    

    That should return what you want.

    0 讨论(0)
  • 2020-11-30 04:41

    Expanding on ilir's answer and its comment, you can check if a column is a factor and that it does not already contain the new level, then add the level and thus make the function re-runable:

    addLevel <- function(x, newlevel=NULL) {
      if(is.factor(x)) {
        if (is.na(match(newlevel, levels(x))))
          return(factor(x, levels=c(levels(x), newlevel)))
      }
      return(x)
    }
    

    You can then apply it like so:

    dataFrame$column <- addLevel(dataFrame$column, "newLevel")
    
    0 讨论(0)
  • 2020-11-30 04:43

    Since this question was last answered this has become possible using fct_explicit_na() from the forcats package. I add here the example given in the documentation.

    f1 <- factor(c("a", "a", NA, NA, "a", "b", NA, "c", "a", "c", "b"))
    table(f1)
    
    # f1
    # a b c 
    # 4 2 2 
    
    f2 <- forcats::fct_explicit_na(f1)
    table(f2)
    
    # f2
    #     a         b         c (Missing) 
    #     4         2         2         3 
    

    Default value is (Missing) but this can be changed via the na_level argument.

    0 讨论(0)
  • 2020-11-30 04:47

    The levels function accept the levels(x) <- value call. Therefore, it's very easy to add different levels:

    f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
    str(f1)
     Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
    levels(f1) <- c(levels(f1),"No Answer")
    f1[is.na(f1)] <- "No Answer"
    str(f1)
     Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
    

    You can then loop it around all variables in a data.frame:

    f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
    f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b"))
    f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a"))
    df1 <- data.frame(f1,n1=1:11,f2,f3)
    
    str(df1)
      'data.frame':   11 obs. of  4 variables:
      $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
      $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
      $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...
      $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...    
    
    for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer")
    df1[is.na(df1)] <- "No Answer"
    
    str(df1)
     'data.frame':   11 obs. of  4 variables:
      $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
      $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
      $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...
      $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...
    
    0 讨论(0)
提交回复
热议问题