R dplyr: rename variables using string functions

后端 未结 8 543
说谎
说谎 2020-11-28 03:45

(Somewhat related question: Enter new column names as string in dplyr's rename function)

In the middle of a dplyr chain (%>%), I wou

相关标签:
8条回答
  • 2020-11-28 04:30

    I think you're looking at the documentation for plyr::rename, not dplyr::rename. You would do something like this with dplyr::rename:

    iris %>% rename_(.dots=setNames(names(.), tolower(gsub("\\.", "_", names(.)))))
    
    0 讨论(0)
  • 2020-11-28 04:30

    For this particular [but fairly common] case, the function has already been written in the janitor package:

    library(janitor)
    
    iris %>% clean_names()
    
    ##   sepal_length sepal_width petal_length petal_width species
    ## 1          5.1         3.5          1.4         0.2  setosa
    ## 2          4.9         3.0          1.4         0.2  setosa
    ## 3          4.7         3.2          1.3         0.2  setosa
    ## 4          4.6         3.1          1.5         0.2  setosa
    ## 5          5.0         3.6          1.4         0.2  setosa
    ## 6          5.4         3.9          1.7         0.4  setosa
    ## .          ...         ...          ...         ...     ...
    

    so all together,

    iris %>% 
        clean_names() %>%
        gather(measurement, value, -species) %>%
        group_by(species,measurement) %>%
        summarise(avg_value = mean(value))
    
    ## Source: local data frame [12 x 3]
    ## Groups: species [?]
    ## 
    ##       species  measurement avg_value
    ##        <fctr>        <chr>     <dbl>
    ## 1      setosa petal_length     1.462
    ## 2      setosa  petal_width     0.246
    ## 3      setosa sepal_length     5.006
    ## 4      setosa  sepal_width     3.428
    ## 5  versicolor petal_length     4.260
    ## 6  versicolor  petal_width     1.326
    ## 7  versicolor sepal_length     5.936
    ## 8  versicolor  sepal_width     2.770
    ## 9   virginica petal_length     5.552
    ## 10  virginica  petal_width     2.026
    ## 11  virginica sepal_length     6.588
    ## 12  virginica  sepal_width     2.974
    
    0 讨论(0)
  • 2020-11-28 04:30

    Both select() and select_all() can be used to rename columns.

    If you wanted to rename only specific columns you can use select:

    iris %>% 
      select(sepal_length = Sepal.Length, sepal_width = Sepal.Width, everything()) %>% 
      head(2)
    
      sepal_length sepal_width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    

    rename does the same thing, just without having to include everything():

    iris %>% 
      rename(sepal_length = Sepal.Length, sepal_width = Sepal.Width) %>% 
      head(2)
    
      sepal_length sepal_width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    

    select_all() works on all columns and can take a function as an argument:

    iris %>% 
      select_all(tolower)
    
    iris %>% 
      select_all(~gsub("\\.", "_", .)) 
    

    or combining the two:

    iris %>% 
      select_all(~gsub("\\.", "_", tolower(.))) %>% 
      head(2)
    
      sepal_length sepal_width petal_length petal_width species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    
    0 讨论(0)
  • 2020-11-28 04:32

    Here's a way around the somewhat awkward rename syntax:

    myris <- iris %>% setNames(tolower(gsub("\\.","_",names(.))))
    
    0 讨论(0)
  • 2020-11-28 04:35

    My eloquent attempt using base, stringr and dplyr:

    EDIT: library(tidyverse) now includes all three libraries.

    library(tidyverse)
    library(maggritr) # Though in tidyverse to use %>% pipe you need to call it 
    # library(dplyr)
    # library(stringr)
    # library(maggritr)
    
    names(iris) %<>% # pipes so that changes are apply the changes back
        tolower() %>%
        str_replace_all(".", "_")
    

    I do this for building functions with piping.

    my_read_fun <- function(x) {
        df <- read.csv(x) %>%
        names(df) %<>%
            tolower() %>%
            str_replace_all("_", ".")
        tempdf %<>%
            select(a, b, c, g)
    }
    
    0 讨论(0)
  • 2020-11-28 04:44

    In case you don't want to write the regular expressions yourself, you could use

    • the snakecase-pkg which is very flexible,
    • janitor::make_clean_names() which has some nice defaults or
    • janitor::clean_names() which does the same as make_clean_names(), but works directly on data frames.

    Invoking them inside of a pipeline should be straightforward.

    library(magrittr)
    library(snakecase)
    
    iris %>% setNames(to_snake_case(names(.)))
    iris %>% tibble::as_tibble(.name_repair = to_snake_case)
    iris %>% purrr::set_names(to_snake_case)
    iris %>% dplyr::rename_all(to_snake_case)
    iris %>% janitor::clean_names()
    
    
    0 讨论(0)
提交回复
热议问题