I am wondering if there is an easier way to create these variables than what I am doing? I am trying to turn the values of my vehicle type variable in to variables themselves.
There is a helper function in the cobalt
package for this called splitfactor()
, which splits factors into dummy variables. You would run the following:
norm.knnN <- cobalt::splitfactor(norm.knnN,
c("gearbox", "vehicleType",
"fuelType", "brand", "notRepairedDamage"),
drop.first = "if2")
Setting drop.first = "if2"
makes it so that if a factor has two values (e.g., "yes"
and "no"
), one of the dummy variables will be dropped since it is perfectly redundant to the other one.
It is better to use dplyr's syntax. There are some literature to learn how to use the syntax. (https://genomicsclass.github.io/book/pages/dplyr_tutorial.html)
In the syntax, I usually use case_when function to make several condition rather than ifelse function.
norm.knnN$gearbox[norm.knnN$gearbox=="automatic"] = 1
norm.knnN$gearbox[norm.knnN$gearbox=="manual"] = 0
It would be changed to this syntax I mentioned.
norm.knnN %>%
mutate(gearbox = case_when(
gearbox == "automatic" ~ 1,
gearbox == "manual" ~ 0,
TRUE ~ NA))