Replace in a factor column

后端 未结 6 1713
南方客
南方客 2020-12-03 02:43

I want to replace values in a factors column with a valid value. But I can not find a way. This example is only for demonstration. The original data

相关标签:
6条回答
  • 2020-12-03 03:16

    1) addNA If fac is a factor addNA(fac) is the same factor but with NA added as a level. See ?addNA

    To force the NA level to be 88:

    facna <- addNA(fac)
    levels(facna) <- c(levels(fac), 88)
    

    giving:

    > facna
     [1] 1  2  3  3  4  88 2  4  88 3 
    Levels: 1 2 3 4 88
    

    1a) This can be written in a single line as follows:

    `levels<-`(addNA(fac), c(levels(fac), 88))
    

    2) factor It can also be done in one line using the various arguments of factor like this:

    factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL)
    

    2a) or equivalently:

    factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL)
    

    3) ifelse Another approach is:

    factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88))
    

    4) forcats The forcats package has a function for this:

    library(forcats)
    
    fct_explicit_na(fac, "88")
    ## [1] 1  2  3  3  4  88 2  4  88 3 
    ## Levels: 1 2 3 4 88
    

    Note: We used the following for input fac

    fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1", 
    "2", "3", "4"), class = "factor")
    

    Update: Have improved (1) and added (1a). Later added (4).

    0 讨论(0)
  • 2020-12-03 03:17

    other way to do is:

    #check levels
    levels(df$a)
    #[1] "3"  "4"  "7"  "9"  "10"
    
    #add new factor level. i.e 88 in our example
    df$a = factor(df$a, levels=c(levels(df$a), 88))
    
    #convert all NA's to 88
    df$a[is.na(df$a)] = 88
    
    #check levels again
    levels(df$a)
    #[1] "3"  "4"  "7"  "9"  "10" "88"
    
    0 讨论(0)
  • 2020-12-03 03:25

    My way would be a little bit traditional by using factor function:

    a <- factor(a, 
                exclude = NULL, 
                levels = c(levels(a), NA),
                labels = c(levels(a), "None"))
    

    You can replace "None" with appropriate replacement that you want (0L for example)

    0 讨论(0)
  • 2020-12-03 03:31

    The problem is that NA is not a level of that factor:

    > levels(df$a)
    [1] "2"  "4"  "5"  "9"  "10"
    

    You can't change it straight away, but the following will do the trick:

    df$a <- as.numeric(as.character(df$a))
    df[is.na(df$a),1] <- 88
    df$a <- as.factor(df$a)
    > df$a
     [1] 9  88 3  9  5  9  88 8  3  9 
    Levels: 3 5 8 9 88
    > levels(df$a)
    [1] "3"  "5"  "8"  "9"  "88"
    
    0 讨论(0)
  • 2020-12-03 03:31

    I had similar issues and I want to add what I consider the most pragmatic (and also tidy) solution:

    Convert the column to a character column, use mutate and a simple ifelse-statement to change the NA values to what you want the factor level to be (I have chosen "None"), convert it back to a factor column:

    df %>% mutate(
    a = as.character(a),
    a = ifelse(is.na(a), "None", a),
    a = as.factor(a)
    )
    

    Clean and painless because you do not actually have to dabble with NA values when they occur in a factor column. You bypass the weirdness and end up with a clean factor variable.

    0 讨论(0)
  • 2020-12-03 03:40

    The basic concept of a factor variable is that it can only take specific values, i.e., the levels. A value not in the levels is invalid.

    You have two possibilities:

    If you have a variable that follows this concept, make sure to define all levels when you create it, even those without corresponding values.

    Or make the variable a character variable and work with that.

    PS: Often these problems result from data import. For instance, what you show there looks like it should be a numeric variable and not a factor variable.

    0 讨论(0)
提交回复
热议问题