Replace contents of factor column in R dataframe

后端 未结 8 1630
我在风中等你
我在风中等你 2020-11-28 20:59

I need to replace the levels of a factor column in a dataframe. Using the iris dataset as an example, how would I replace any cells which contain virginic

相关标签:
8条回答
  • 2020-11-28 21:28

    For the things that you are suggesting you can just change the levels using the levels:

    levels(iris$Species)[3] <- 'new'
    
    0 讨论(0)
  • 2020-11-28 21:28

    In case you have to replace multiple values and if you don't mind "refactoring" your variable with as.factor(as.character(...)) you could try the following:

    replace.values <- function(search, replace, x){
      stopifnot(length(search) == length(replace))
      xnew <- replace[ match(x, search) ]
      takeOld <- is.na(xnew) & !is.na(x)
      xnew[takeOld] <- x[takeOld]
      return(xnew)
    }
    
    iris$Species <- as.factor(search=c("oldValue1","oldValue2"),
                              replace=c("newValue1","newValue2"),
                              x=as.character(iris$Species))
    
    0 讨论(0)
  • 2020-11-28 21:34

    You can use the function revalue from the package plyr to replace values in a factor vector.

    In your example to replace the factor virginica by setosa:

     data(iris)
     library(plyr)
     revalue(iris$Species, c("virginica" = "setosa")) -> iris$Species
    
    0 讨论(0)
  • 2020-11-28 21:34

    I had the same problem. This worked better:

    Identify which level you want to modify: levels(iris$Species)

        "setosa" "versicolor" "virginica" 
    

    So, setosa is the first.

    Then, write this:

         levels(iris$Species)[1] <-"new name"
    
    0 讨论(0)
  • 2020-11-28 21:38

    Using dlpyr::mutate and forcats::fct_recode:

    library(dplyr)
    library(forcats)
    
    iris <- iris %>%  
      mutate(Species = fct_recode(Species,
        "Virginica" = "virginica",
        "Versicolor" = "versicolor"
      )) 
    
    iris %>% 
      count(Species)
    
    # A tibble: 3 x 2
         Species     n
          <fctr> <int>
    1     setosa    50
    2 Versicolor    50
    3  Virginica    50   
    
    0 讨论(0)
  • 2020-11-28 21:39

    I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:

    levels(iris$Species)
    # [1] "setosa"     "versicolor" "virginica" 
    

    Your example was bad, this works:

    iris$Species[iris$Species == 'virginica'] <- 'setosa'
    

    This is what more likely creates the problem you were seeing with your own data:

    iris$Species[iris$Species == 'virginica'] <- 'new.species'
    # Warning message:
    # In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L,  :
    #   invalid factor level, NAs generated
    

    It will work if you first increase your factor levels:

    levels(iris$Species) <- c(levels(iris$Species), "new.species")
    iris$Species[iris$Species == 'virginica'] <- 'new.species'
    
    0 讨论(0)
提交回复
热议问题