How to make the levels of a factor in a data frame consistent across all columns?

扶醉桌前 提交于 2019-12-03 13:11:24

You could change the levels of the dataset "df" to be in the same order by looping (lapply) and convert to factor again with the specified levels and assign it back to the corresponding columns.

lvls <- c('PASS', 'WARN', 'FAIL')
df[] <-  lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

If you opt to use data.table

library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]

setDT converts to "data.frame" to "data.table", assign (:=) the column names of the dataset to the reconverted factor columns (lapply(..)). .SD denotes "Subset of Datatable".

data

df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L), 
.Label = "PASS", class = "factor"), 
  Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL", 
 "PASS", "WARN"), class = "factor"), Test3 = structure(c(1L, 
 1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 = 
 structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"), 
 class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = 
"WARN", class = "factor")), .Names = c("Test1", 
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")

Using dplyr:

library(dplyr)
df <- df %>% mutate_each(funs(factor(., levels = c('PASS', 'WARN', 'FAIL'))))

You get:

#> str(df)
#'data.frame':  5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

A more general approach supposing you can have other string values in your data.frame and NA:

library(magrittr)

fac = df %>% as.matrix %>% as.vector %>% unique
df1 = data.frame(lapply(df, factor, levels = fac[!is.na(fac)]))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!