How to make the levels of a factor in a data frame consistent across all columns?

扶醉桌前 提交于 2019-12-03 13:11:24

You could change the levels of the dataset "df" to be in the same order by looping (lapply) and convert to factor again with the specified levels and assign it back to the corresponding columns.

lvls <- c('PASS', 'WARN', 'FAIL')
df[] <-  lapply(df, factor, levels=lvls)
# 'data.frame': 5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

If you opt to use data.table

setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]

setDT converts to "data.frame" to "data.table", assign (:=) the column names of the dataset to the reconverted factor columns (lapply(..)). .SD denotes "Subset of Datatable".


df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L), 
.Label = "PASS", class = "factor"), 
  Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL", 
 "PASS", "WARN"), class = "factor"), Test3 = structure(c(1L, 
 1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 = 
 structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"), 
 class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label = 
"WARN", class = "factor")), .Names = c("Test1", 
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1", 
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")

Using dplyr:

df <- df %>% mutate_each(funs(factor(., levels = c('PASS', 'WARN', 'FAIL'))))

You get:

#> str(df)
#'data.frame':  5 obs. of  5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2

A more general approach supposing you can have other string values in your data.frame and NA:


fac = df %>% as.matrix %>% as.vector %>% unique
df1 = data.frame(lapply(df, factor, levels = fac[!]))