Adding prefix or suffix to most data.frame variable names in piped R workflow

问题

I want to add a suffix or prefix to most variable names in a data.frame, typically after they've all been transformed in some way and before performing a join. I don't have a way to do this without breaking up my piping.

For example, with this data:

library(dplyr)
set.seed(1)
dat14 <- data.frame(ID = 1:10, speed = runif(10), power = rpois(10, 1),
                    force = rexp(10), class = rep(c("a", "b"),5))

I want to get to this result (note variable names):

  class speed_mean_2014 power_mean_2014 force_mean_2014
1     a       0.5572500             0.8       0.5519802
2     b       0.2850798             0.6       1.0888116

My current approach is:

means14 <- dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.)))  

names(means14)[2:length(names(means14))] <- paste0(names(means14)[2:length(names(means14))], "_mean_2014")

Is there an alternative to that clunky last line that breaks up my pipes? I've looked at select() and rename() but don't want to explicitly specify each variable name, as I usually want to rename all except a single variable and might have a much wider data.frame than in this example.

I'm imagining a final piped command that approximates this made-up function:

appendname(cols = 2:n, str = "_mean_2014", placement = "suffix")

Which doesn't exist as far as I know.

回答1:

You can pass functions to rename_at, so do

 means14 <- dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_all(funs(mean(.))) %>% 
  rename_at(vars(-class),function(x) paste0(x,"_2014"))

回答2:

After additional experimenting since posting this question, I've found that the setNames function will work with the piping as it returns a data.frame:

dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.))) %>%
  setNames(c(names(.)[1], paste0(names(.)[-1],"_mean_2014"))) 

  class speed_mean_2014 power_mean_2014 force_mean_2014
1     a       0.5572500             0.8       0.5519802
2     b       0.2850798             0.6       1.0888116

回答3:

This is a bit quicker, but not totally what you want:

dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.))) -> means14 

names(means14)[-1] %<>% paste0("_mean_2014")

if you haven't used the %<>%-operator before definitely check this link out, its a super-useful tool.

you can also use it for recomputing or rounding some columns, like this df$meancolumn %<>% round() , and so on, it just comes up very often and just saves you a lot of writing

回答4:

As of February 2017 you can do this with the dplyr command rename_(...).

In the case of this example you could do.

dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.))) %>%
  rename_(names(.)[-1], paste0(names(.)[-1],"_mean_2014")))

This is rather similar to the answer with set_names but works with tibbles too!

回答5:

This is more of a step back, but you might think of reshaping your data in order to apply the function to multiple years at the same time. This will preserve tidyness. If you're going to want to end up comparing different years, it might make sense to have the year be a separate variable in a dataframe, rather than storing the year in the names. You should be able to use summarise_ to get the mean_year behavior. See http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html

library(dplyr)
library(tidyr)
set.seed(1)
dat14 <- data.frame(ID = 1:10, speed = runif(10), power = rpois(10, 1),
                    force = rexp(10), class = rep(c("a", "b"),5))

dat14 %>% 
  gather(variable, value, -ID, -class) %>% 
  mutate(year = 2014) %>% 
  group_by(class, year, variable)%>% 
  summarise(mean = mean(value))`

回答6:

While Sam Firkes solution using setNames() ist certainly the only solution keeping an unbroken pipe, it will not work with the tbl objects from dplyr, since the column names are not accessible by methods from the usual base R naming functions. Here is a function that you can use within a pipe with tbl objects as well, thanks to this solution by hrbrmstr. It adds predefined prefixes and suffixes at the specified column indices. Default is all columns.

tbl.renamer <- function(tbl,prefix="x",suffix=NULL,index=seq_along(tbl_vars(tbl))){
  newnames <- tbl_vars(tbl) # Get old variable names
  names(newnames) <- newnames
  names(newnames)[index] <- paste0(prefix,".",newnames,suffix)[index] # create a named vector for .dots
  rename_(tbl,.dots=newnames) # rename the variables
}

Example usage (Assume auth_users beeing an tbl_sql object):

auth_user %>% tbl_vars
tbl.renamer(auth_user) %>% tbl_vars
auth_user %>% tbl.renamer %>% tbl_vars
auth_user %>% tbl.renamer(index = c(1,5)) %>% tbl_vars

来源：https://stackoverflow.com/questions/29948876/adding-prefix-or-suffix-to-most-data-frame-variable-names-in-piped-r-workflow

标签

dplyr

magrittr