Faster ways to calculate frequencies and cast from long to wide

前端未结

关注

 4  968

I am trying to obtain counts of each combination of levels of two variables, \"week\" and \"id\". I\'d like the result to have \"id\" as rows, and \"week\" as columns, and t

相关标签:

4条回答

独厮守ぢ

2020-11-21 05:29
The reason ddply is taking so long is that the splitting by group is not run in parallel (only the computations on the 'splits'), therefore with a large number of groups it will be slow (and .parallel = T) will not help.

An approach using data.table::dcast (data.table version >= 1.9.2) should be extremely efficient in time and memory. In this case, we can rely on default argument values and simply use:
```
library(data.table) 
dcast(setDT(data), id ~ week)
# Using 'week' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1
```
Or setting the arguments explicitly:
```
dcast(setDT(data), id ~ week, value.var = "week", fun = length)
#    id 1 2 3
# 1:  1 2 1 1
# 2:  2 0 0 1
```
For pre-data.table 1.9.2 alternatives, see edits.
0 讨论(0)
发布评论:

提交评论
- 加载中...

执念已碎

2020-11-21 05:36

You don't need ddply for this. The dcast from reshape2 is sufficient:

dat <- data.frame(
    id = c(rep(1, 4), 2),
    week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

  id 1 2 3
1  1 2 1 1
2  2 0 0 1

Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:

xtabs(~id+week, data=dat)

   week
id  1 2 3
  1 2 1 1
  2 0 0 1

0 讨论(0)

轻奢々

2020-11-21 05:36
You could just use the table command:
```
table(data$id,data$week)

    1 2 3
  1 2 1 1
  2 0 0 1
```
If "id" and "week" are the only columns in your data frame, you can simply use:
```
table(data)
#    week
# id  1 2 3
#   1 2 1 1
#   2 0 0 1
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

余生分开走

2020-11-21 05:46

A tidyverse option could be :

library(dplyr)
library(tidyr)

df %>%
  count(id, week) %>%
  pivot_wider(names_from = week, values_from = n, values_fill = list(n = 0))
  #spread(week, n, fill = 0) #In older version of tidyr

#     id   `1`   `2`   `3`
#   <dbl> <dbl> <dbl> <dbl>
#1     1     2     1     1
#2     2     0     0     1

Or using tabyl from janitor :

janitor::tabyl(df, id, week)
# id 1 2 3
#  1 2 1 1
#  2 0 0 1

data

df <- structure(list(id = c(1L, 1L, 1L, 1L, 2L), week = c(1L, 2L, 3L, 
1L, 3L)), class = "data.frame", row.names = c(NA, -5L))

0 讨论(0)