What is the most effective (ie efficient / appropriate) way to clean up a factor containing multiple levels that need to be collapsed? That is, how to combine two or more fa
You may use the below function for combining/collapsing multiple factors:
combofactor <- function(pattern_vector,
replacement_vector,
data) {
levels <- levels(data)
for (i in 1:length(pattern_vector))
levels[which(pattern_vector[i] == levels)] <-
replacement_vector[i]
levels(data) <- levels
data
}
Example:
Initialize x
x <- factor(c(rep("Y",20),rep("N",20),rep("y",20),
rep("yes",20),rep("Yes",20),rep("No",20)))
Check the structure
str(x)
# Factor w/ 6 levels "N","No","y","Y",..: 4 4 4 4 4 4 4 4 4 4 ...
Use the function:
x_new <- combofactor(c("Y","N","y","yes"),c("Yes","No","Yes","Yes"),x)
Recheck the structure:
str(x_new)
# Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...