I am trying to obtain counts of each combination of levels of two variables, \"week\" and \"id\". I\'d like the result to have \"id\" as rows, and \"week\" as columns, and t
The reason ddply
is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T
) will not help.
An approach using data.table::dcast
(data.table
version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
library(data.table)
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
Or setting the arguments explicitly:
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
# id 1 2 3
# 1: 1 2 1 1
# 2: 2 0 0 1
For pre-data.table
1.9.2 alternatives, see edits.