问题
I have a question about recoding data. I would like to use a lookup table and I am wondering how to recode NA and use an approach similar to %in%.
Sample data:
gender <- c("Female", "Not Disclosed", "Unknown" , "Male", "Male", "Female", NA)
df_gender <- as.data.frame(gender)
df_gender$gender <- as.character(gender)
My first approach to recode is:
df_gender$gender[df_gender$gender == "Female"] <- "F"
df_gender$gender[df_gender$gender == "Male"] <- "M"
df_gender$gender[df_gender$gender %in% c("Unknown", "Not Disclosed", NA)] <- "Missing"
This approach works appropriately. However, it is tedious when there are lots of variables and can lead to a lot of lines of code. I would like to use a lookup table such as the other approach I tried:
df_gender2 <- as.data.frame(gender)
df_gender2$gender <- as.character(gender)
gender_lookup <- c(Female = "F", Male = "M", Unknown = "Missing", "Not Disclosed" = "Missing")
df_gender2$gender <- gender_lookup[df_gender2$gender]
This works, but does not recode NA to missing. Is there a way to combine "Not Disclosed" and "Unknown" to set it equal to "Missing" without typing them separately? Second, using a lookup table, is there a way to also recode NA to "Missing"?
来源:https://stackoverflow.com/questions/42823269/recoding-variables-in-r-with-a-lookup-table