specify dplyr column names [duplicate]

末鹿安然 提交于 2019-11-29 16:36:16

问题


This question already has an answer here:

  • Group by multiple columns in dplyr, using string vector input 9 answers

How can I pass column names to dplyr if I do not know the column name, but want to specify it through a variable?

e.g. this works:

require(dplyr)
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(group) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

But this does not

require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(someColumn) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

回答1:


I just gave a similar answer over at Group by multiple columns in dplyr, using string vector input, but for good measure: functions that allow you to operate on columns using strings have been added to dplyr. These have the same name as the regular dplyr functions, but end in an underscore. The functions are described in detail in this vignette.

Given df and someColumn from the OP, this now works a treat:

gdf <- df %>% group_by_(someColumn) %>% summarise(m1=mean(V1),m2=mean(V2),m3=mean(V3))

Note that it is group_by_, rather than group_by, and the %>% operator is used as %.% is deprecated.




回答2:


Here's an answer to this straightforward question, obtained by picking through hadley's solution to his posted dupe.

gdf <- df %.% regroup( lapply( someColumn, as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

FWIW, my use case involved grouping by one variable column and one constant column. The solution to that is:

gdf <- df %.% regroup( lapply( c( 'constant_column', someColumn), as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))

Finally, the posted eval solution doesn't work. That just makes a new column whose values are all what someColumn evals to. I'm not yet cool enough to leave a comment or downvote it.




回答3:


You can use summarise_ as follow:

plotVar         = "Stocks_US_TotalCrudeOil"
dfBand <- mydf[ c( plotVar ,  "year", "week"  )  ] %>%
            filter ( year %in% bandYears )   %>%
            group_by (  week )   %>% 
            summarise_ (   ymini =  paste( "min(" ,  as.name(plotVar)  ,")"  ) 
                         , ymaxi =  paste( "max(" ,  as.name(plotVar)  ,")"  )     )
dfBand



回答4:


pollutant <- "sulfate"
summarise(data, mean(eval(as.symbol(pollutant)), na.rm = TRUE))

I was trying to ask the same question for my own problem. Then I found a solution to it. I encapsulate the expression with eval(as.symbol()).




回答5:


I expect you just have to use eval

require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(eval(someColumn)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))


来源:https://stackoverflow.com/questions/21390141/specify-dplyr-column-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!