In python, scikit has a great function called LabelEncoder that maps categorical levels (strings) to integer representation.
Is there anything in R to do this?
Try CatEncoders package. It replicates the Python sklearn.preprocessing
functionality.
# variable to encode values
colors = c("red", "red", "blue", "green")
lab_enc = LabelEncoder.fit(colors)
# new values are transformed to NA
values = transform(lab_enc, c('red', 'red', 'yellow'))
values
# [1] 3 3 NA
# doing the inverse: given the encoded numbers return the labels
inverse.transform(lab_enc, values)
# [1] "red" "red" NA
I would add the functionality of reporting the non-matching labels with a warning.
PS: It also has the OneHotEncoder
function.