I need some help tidying my data. I\'m trying to convert some integers to factors (but not all integers to factors). I think I can do with selecting the variables in question
Honestly, I'd do it like this:
library(dplyr)
df = data.frame("LOC_ID" = c(1,2,3,4),
"STRS" = c("a","b","c","d"),
"UPC_CDE" = c(813,814,815,816))
df$LOC_ID = as.factor(df$LOC_ID)
df$UPC_CDE = as.factor(df$UPC_CDE)
You can use mutate_at
instead. Here's an example using the iris
dataframe:
library(dplyr)
iris_factor <- iris %>%
mutate_at(vars(Sepal.Width,
Sepal.Length),
funs(factor))
As of dplyr 0.8.0, funs()
is deprecated. Use list()
instead, as in
library(dplyr)
iris_factor <- iris %>%
mutate_at(vars(Sepal.Width,
Sepal.Length),
list(factor))
And the proof:
> str(iris_factor)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
$ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
As of dplyr 1.0.0 released on CRAN 2020-06-01, the scoped functions mutate_at()
, mutate_if()
and mutate_all()
have been superseded thanks to the more generalizable across()
. This means you can stay with just mutate()
. The introductory blog post from April explains why it took so long to discover.
Toy example:
library(dplyr)
iris %>%
mutate(across(c(Sepal.Width,
Sepal.Length),
factor))
In your case, you'd do this:
library(dplyr)
raw_data_tbl %>%
mutate(across(c(is.numeric,
-contains("units"),
-c(PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, REAL_PRICE_HHU,
REBATE, RETURN_UNITS, UNITS_PER_CASE, Profit,
STR_COST, DCC, CREDIT_AMT)),
factor))