Change level of multiple factor variables

狂风中的少年 提交于 2019-12-24 10:57:31

问题


everyone -

I want to preface this by saying that I already looked at this link to try to solve my problem:

Applying the same factor levels to multiple variables in an R data frame

The difference is that in that problem, the OP wanted to change the levels of factors that all had the same levels. In my instance, I'm looking to change just the first level, which is set to ' ', to something like 'Unknown' and leave the rest of the levels alone. I know I could do this in a "non-R" way with something like this:

for (i in 64:88) {
  var.name <- colnames(df[i])
  levels(eval(parse(text=paste('df$', var.name, sep=''))))[levels(eval(parse(text=paste('df$', var.name, sep='')))) == ' '] <- 'Unknown'
}

But that's an inefficient way to do it. Trying to use the method proposed in the question linked above gave me this code:

df[64:88] <- lapply(df[64:88], factor, levels=c('Unknown', ??))

I don't know what to put in place of the question marks. I tried using just "levels[-1]" but it's obvious why that didn't work. I also tried "levels(df[64:88])[-1]" but again no good. So I tried to revamp the code with the following:

df[64:88] <- lapply(df[64:88], function(x) levels(x)[levels(x) == ' '] <- 'Unknown')

but I get NULL whenever I call levels$transaction_type1 (where transaction_type1 is the column name of df[64]).

What am I missing here?

Thanks in advance for your help!

Per a couple of requests, here is an example of my data:

df$transaction_type1[1:100]
  [1]                                                                                                                                                
 [13] HOME RENEW                                                                                                                                     
 [25]                                                                                                                                                
 [37]                                                                                                                                                
 [49]                                                                                                                                                
 [61] AUTO MANAGE                                                                                     AUTO RENEW                                     
 [73]             AUTO MANAGE                                                                                     AUTO RENEW                         
 [85]                                                                                                                                                
 [97]                                                
Levels:   AUTO CLAIM AUTO MANAGE AUTO PURCHASE AUTO RENEW HOME CLAIM HOME RENEW

As you can see, there is a lot of values equal to ' ' and all 25 variables look just like this, but with different levels. My data consists of 222 variables and 24,850 rows, so I don't know what the standard is on SO for giving example data. Also, this snippet of code might help as well:

> levels(df$transaction_type1)
#[1] " "             "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"

> levels(df$transaction_type1)[levels(df$transaction_type1) == ' '] <- 'Unknown'
> levels(df$transaction_type1)
#[1] "Unknown"       "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"   

If more information is needed, please let me know so I can provide it and also learn the SO standards of asking for help. Thanks!


回答1:


Something like this?

# it seems like your original data has a structure like this
df <- data.frame(x = factor(c("a", "", "b"), levels = c("", "a", "b")),
                 y = factor(c("c", "", "d"), levels = c("", "c", "d")))

lapply(df, levels)
# $x
# [1] ""  "a" "b"
# 
# $y
# [1] ""  "c" "d"    

# change the "" level to "unknown", and return the updated vector
df[] <- lapply(df, function(x){
 levels(x)[levels(x) == ""] <- "unknown"
 x
 })

lapply(df, levels)
# $x
# [1] "unknown" "a"       "b"      
# 
# $y
# [1] "unknown" "c"       "d"


来源:https://stackoverflow.com/questions/19137793/change-level-of-multiple-factor-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!