问题
I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I wanted to pass these in, using with=FALSE.
carsx=as.data.table(cars)
lapply( list(speed="speed",dist= "dist"), #error object 'ansvals' not found
function(x) carsx[,list(mean(x), min(x), max(x) ), with=FALSE ] )
Since this does not work, I tried the more simple approach without lapply.
carsx[,list(mean("speed"), min("speed"), max("speed") ), with=FALSE ] #error object 'ansvals' not found
This does not work either. Is there any way to do something like this? Is this behaviour of 'with' wanted? (I am aware that ?data.table
mentions with only to select columns, but in my case it would be useful to be able to transform them as well)
When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically.
EDIT My aim is to get a summary per group in wide format, for different variables. I tried to extend the following, which works only for one variable, for a list of variables.
carsx[,list(mean(speed), min(speed), max(speed) ) ,by=(dist>50)
Lamentably SO doesnt let me post my other question. There I described that I want an output similiar to:
lapply( list(speed="speed",dist= "dist"),
function(x) do.call("as.data.frame", aggregate(cars[,x], list(class=cars$dist>50), FUN=summary) ) )
Expected Output would be something like:
$speed
V1 V2 V3
1: FALSE 12.96970 4 20
2: TRUE 20.11765 14 25
$dist
V1 V2 V3
1: FALSE 12.96970 4 20
2: TRUE 20.11765 14 25
回答1:
You can specify the columns with the .SDcols
parameter:
carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))),
.SDcols = c("speed", "dist")]
# speed dist
# 1: 15.4 42.98
# 2: 4.0 2.00
# 3: 25.0 120.00
carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))),
.SDcols = "speed"]
# speed
# 1: 15.4
# 2: 4.0
# 3: 25.0
回答2:
Building on Svens answer a combination of .SDcols, rbindlist, and outer and inner lapply did the trick. The inner lapply is necessary to access .SD.
lapply( list(speed="speed",dist= "dist"),
function(x) carsx[ , rbindlist(lapply(.SD, function(x) list(mean=mean(x), min=min(x), max=max(x)) )),
.SDcols = x,by= (dist>50)] )
Result:
$speed
dist mean min max
1: FALSE 12.96970 4 20
2: TRUE 20.11765 14 25
$dist
dist mean min max
1: FALSE 27.84848 2 50
2: TRUE 72.35294 52 120
来源:https://stackoverflow.com/questions/26844251/data-table-using-with-false-and-transforming-function-summary-function