I build a vector of factors containing NA.
my_vec <- factor(c(NA,\"a\",\"b\"),exclude=NULL)
levels(my_vec)
# [1] \"a\" \"b\" NA
I change on
You have to quote NA, otherwise R treats it as a null value rather than a factor level. Factor levels sort alphabetically by default, but obviously that's not always useful, so you can specify a different order by passing a new list order to levels()
require(plyr)
my_vec <- factor(c("NA","a","b1","b2"))
vec2 <- revalue(my_vec,c("b1"="c","b2"="c"))
#now reorder levels
my_vec2 <- factor(vec2, levels(vec2)[c(1,3,2)])
Levels: a NA c
I finally created a function that first replaces the NA
value with a temp one (inspired by @lmo), then does the replacement I wanted the standard way, then puts NA
back in its place using @rawr's suggestion.
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
my_vec <- level_sub(my_vec,c("b1","b2"),"c")
my_vec
# 1] <NA> a c c
# Levels: a <NA> c
As a bonus level_sub
can be used with na_rep = NULL
which will remove the NA
, and it will look good in pipe chains :).
level_sub <- function(x,from,to,na_rep = "NA"){
if(!is.null(na_rep)) {levels(x)[is.na(levels(x))] <- na_rep}
levels(x)[levels(x) %in% from] <- to
if(!is.null(na_rep)) {attr(x, 'levels')[levels(x) == na_rep] <- NA}
x
}
Nevertheless it seems that R really doesn't want you to add NA to factors.
levels(my_vec) <- c(NA,"a")
will have a strange behavior but that doesn't stop here. While subset
will keep NA
levels in your columns, rbind
will quietly remove them! I wouldn't be surprised if further investigation revealed that half R functions remove NA
factors, making them very unsafe to work with...