Adding prefix or suffix to most data.frame variable names in piped R workflow

我们两清 提交于 2019-12-17 10:55:09

问题


I want to add a suffix or prefix to most variable names in a data.frame, typically after they've all been transformed in some way and before performing a join. I don't have a way to do this without breaking up my piping.

For example, with this data:

library(dplyr)
set.seed(1)
dat14 <- data.frame(ID = 1:10, speed = runif(10), power = rpois(10, 1),
                    force = rexp(10), class = rep(c("a", "b"),5))

I want to get to this result (note variable names):

  class speed_mean_2014 power_mean_2014 force_mean_2014
1     a       0.5572500             0.8       0.5519802
2     b       0.2850798             0.6       1.0888116

My current approach is:

means14 <- dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.)))  

names(means14)[2:length(names(means14))] <- paste0(names(means14)[2:length(names(means14))], "_mean_2014")

Is there an alternative to that clunky last line that breaks up my pipes? I've looked at select() and rename() but don't want to explicitly specify each variable name, as I usually want to rename all except a single variable and might have a much wider data.frame than in this example.

I'm imagining a final piped command that approximates this made-up function:

appendname(cols = 2:n, str = "_mean_2014", placement = "suffix")

Which doesn't exist as far as I know.


回答1:


You can pass functions to rename_at, so do

 means14 <- dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_all(funs(mean(.))) %>% 
  rename_at(vars(-class),function(x) paste0(x,"_2014"))



回答2:


After additional experimenting since posting this question, I've found that the setNames function will work with the piping as it returns a data.frame:

dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.))) %>%
  setNames(c(names(.)[1], paste0(names(.)[-1],"_mean_2014"))) 

  class speed_mean_2014 power_mean_2014 force_mean_2014
1     a       0.5572500             0.8       0.5519802
2     b       0.2850798             0.6       1.0888116



回答3:


This is a bit quicker, but not totally what you want:

dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.))) -> means14 

names(means14)[-1] %<>% paste0("_mean_2014")

if you haven't used the %<>%-operator before definitely check this link out, its a super-useful tool.

you can also use it for recomputing or rounding some columns, like this df$meancolumn %<>% round() , and so on, it just comes up very often and just saves you a lot of writing




回答4:


As of February 2017 you can do this with the dplyr command rename_(...).

In the case of this example you could do.

dat14 %>%
  group_by(class) %>%
  select(-ID) %>%
  summarise_each(funs(mean(.))) %>%
  rename_(names(.)[-1], paste0(names(.)[-1],"_mean_2014"))) 

This is rather similar to the answer with set_names but works with tibbles too!




回答5:


This is more of a step back, but you might think of reshaping your data in order to apply the function to multiple years at the same time. This will preserve tidyness. If you're going to want to end up comparing different years, it might make sense to have the year be a separate variable in a dataframe, rather than storing the year in the names. You should be able to use summarise_ to get the mean_year behavior. See http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html

library(dplyr)
library(tidyr)
set.seed(1)
dat14 <- data.frame(ID = 1:10, speed = runif(10), power = rpois(10, 1),
                    force = rexp(10), class = rep(c("a", "b"),5))

dat14 %>% 
  gather(variable, value, -ID, -class) %>% 
  mutate(year = 2014) %>% 
  group_by(class, year, variable)%>% 
  summarise(mean = mean(value))`



回答6:


While Sam Firkes solution using setNames() ist certainly the only solution keeping an unbroken pipe, it will not work with the tbl objects from dplyr, since the column names are not accessible by methods from the usual base R naming functions. Here is a function that you can use within a pipe with tbl objects as well, thanks to this solution by hrbrmstr. It adds predefined prefixes and suffixes at the specified column indices. Default is all columns.

tbl.renamer <- function(tbl,prefix="x",suffix=NULL,index=seq_along(tbl_vars(tbl))){
  newnames <- tbl_vars(tbl) # Get old variable names
  names(newnames) <- newnames
  names(newnames)[index] <- paste0(prefix,".",newnames,suffix)[index] # create a named vector for .dots
  rename_(tbl,.dots=newnames) # rename the variables
}

Example usage (Assume auth_users beeing an tbl_sql object):

auth_user %>% tbl_vars
tbl.renamer(auth_user) %>% tbl_vars
auth_user %>% tbl.renamer %>% tbl_vars
auth_user %>% tbl.renamer(index = c(1,5)) %>% tbl_vars


来源:https://stackoverflow.com/questions/29948876/adding-prefix-or-suffix-to-most-data-frame-variable-names-in-piped-r-workflow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!