Use a character vector in the `by` argument

时光怂恿深爱的人放手 提交于 2020-07-03 03:24:09

问题


Within the data.table package in R, is there a way in order to use a character vector to be assigned within the by argument of the calculation?

Here is an example of what would be the desired output from this using mtcars:

 mtcars <- data.table(mtcars)
 ColSelect <- 'cyl' # One Column Option
 mtcars[,.( AveMpg = mean(mpg)), by = .(ColSelect)] # Doesn't work

 # Desired Output 
    cyl   AveMpg
 1:   6 19.74286
 2:   4 26.66364
 3:   8 15.10000

I know that this is possible to use assigning column names in j by enclosing the vector around brackets.

 ColSelect <- 'AveMpg' # Column to be assigned for average mpg value
 mtcars[,(ColSelect):= mean(mpg), by = .(cyl)]
 head(mtcars)

    mpg cyl disp  hp drat    wt  qsec vs am gear carb   AveMpg
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 19.74286
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 19.74286
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 26.66364
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 19.74286
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 15.10000
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 19.74286

Is there a suggestion as to what to put in the by argument in order to achieve this?


回答1:


try to use it like this

mtcars <- data.table(mtcars)
ColSelect <- 'cyl' # One Column Option
mtcars[, AveMpg := mean(mpg), by = ColSelect] # Should work



回答2:


From ?data.table in the by section it says that by accepts:

  • a single character string containing comma separated column names (where spaces are significant since column names may contain spaces
    even at the start or end): e.g., DT[, sum(a), by="x,y,z"]
  • a character vector of column names: e.g., DT[, sum(a), by=c("x", "y")]

So yes, you can use the answer in @cccmir's response. You can also use c() as @akrun mentioned, but that seems slightly extraneous unless you want multiple columns.

The reason you cannot use .() syntax is that in data.table .() is an alias for list(). And according to the same help for by the list() syntax requires an expression of column names - not a character string.

Going off the examples in the by help if you wanted to use multiple variables and pass the names as characters you could do:

  1. mtcars[,.( AveMpg = mean(mpg)), by = "cyl,am"]
  2. mtcars[,.( AveMpg = mean(mpg)), by = c("cyl","am")]


来源:https://stackoverflow.com/questions/48442422/use-a-character-vector-in-the-by-argument

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!