Changing column types with dplyr

后端 未结 3 1183
南旧
南旧 2021-02-11 03:28

I need some help tidying my data. I\'m trying to convert some integers to factors (but not all integers to factors). I think I can do with selecting the variables in question

相关标签:
3条回答
  • 2021-02-11 04:09

    Honestly, I'd do it like this:

    library(dplyr)
    
    df = data.frame("LOC_ID" = c(1,2,3,4),
                    "STRS" = c("a","b","c","d"),
                    "UPC_CDE" = c(813,814,815,816))
    
    df$LOC_ID = as.factor(df$LOC_ID)
    df$UPC_CDE = as.factor(df$UPC_CDE)
    
    0 讨论(0)
  • 2021-02-11 04:11

    You can use mutate_at instead. Here's an example using the iris dataframe:

    library(dplyr)
    
    iris_factor <- iris %>%
      mutate_at(vars(Sepal.Width, 
                     Sepal.Length), 
                funs(factor))
    

    Edit 08/2020

    As of dplyr 0.8.0, funs() is deprecated. Use list() instead, as in

    library(dplyr)
    
    iris_factor <- iris %>%
      mutate_at(vars(Sepal.Width, 
                     Sepal.Length), 
                list(factor))
    

    And the proof:

    > str(iris_factor)
    'data.frame':   150 obs. of  5 variables:
     $ Sepal.Length: Factor w/ 35 levels "4.3","4.4","4.5",..: 9 7 5 4 8 12 4 8 2 7 ...
     $ Sepal.Width : Factor w/ 23 levels "2","2.2","2.3",..: 15 10 12 11 16 19 14 14 9 11 ...
     $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
     $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
     $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    
    0 讨论(0)
  • 2021-02-11 04:12

    As of dplyr 1.0.0 released on CRAN 2020-06-01, the scoped functions mutate_at(), mutate_if() and mutate_all() have been superseded thanks to the more generalizable across(). This means you can stay with just mutate(). The introductory blog post from April explains why it took so long to discover.

    Toy example:

    library(dplyr)
    
    iris %>%
      mutate(across(c(Sepal.Width, 
                      Sepal.Length),
                    factor))
    

    In your case, you'd do this:

    library(dplyr)
    
    raw_data_tbl %>% 
      mutate(across(c(is.numeric,
                      -contains("units"),
                      -c(PRO_ALLOW, RTL_ACTUAL, REAL_PRICE, REAL_PRICE_HHU,
                         REBATE, RETURN_UNITS, UNITS_PER_CASE, Profit,
                         STR_COST, DCC, CREDIT_AMT)),
                    factor))
    
    0 讨论(0)
提交回复
热议问题