In python, scikit has a great function called LabelEncoder that maps categorical levels (strings) to integer representation.
Is there anything in R to do this?
# Data
Country <- c("France", "Spain", "Germany", "Spain", "Germany", "France")
Age <- c(34, 27, 30, 32, 42, 30)
Purchased <- c("No", "Yes", "No", "No", "Yes", "Yes")
df <- data.frame(Country, Age, Purchased)
df
# Output
Country Age Purchased
1 France 34 No
2 Spain 27 Yes
3 Germany 30 No
4 Spain 32 No
5 Germany 42 Yes
6 France 30 Yes
Using CatEncoders package : Encoders for Categorical Variables
library(CatEncoders)
# Saving names of categorical variables
factors <- names(which(sapply(df, is.factor)))
# Label Encoder
for (i in factors){
encode <- LabelEncoder.fit(df[, i])
df[, i] <- transform(encode, df[, i])
}
df
# Output
Country Age Purchased
1 1 34 1
2 3 27 2
3 2 30 1
4 3 32 1
5 2 42 2
6 1 30 2
Using R base : factor function
# Label Encoder
levels <- c("France", "Spain", "Germany", "No", "Yes")
labels <- c(1, 2, 3, 1, 2)
for (i in factors){
df[, i] <- factor(df[, i], levels = levels, labels = labels, ordered = TRUE)
}
df
# Output
Country Age Purchased
1 1 34 1
2 2 27 2
3 3 30 1
4 2 32 1
5 3 42 2
6 1 30 2