Cumulative aggregates within tidyverse

◇◆丶佛笑我妖孽 提交于 2021-01-28 05:51:51

问题


say I have a tibble (or data.table) which consists of two columns:

a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1))

Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation.

Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped with respect to id. For example, the first entry of result should contain mfun(val[1]). The second entry should contain myfun(val[1:2]), and so on. I would like to implent a cumulated version of myfun.

Of course there a lot of easy solutions outside the tidyverse using loops and what not. But I would be interested in a solution within the tidyverse or within the data.table frame work.

Any help is appreciated.


回答1:


You could do it this way:

library(tidyverse)

a %>% 
  group_by(id) %>% 
  mutate(y = map_dbl(seq_along(val),~sd(val[1:.x]))) %>%
  ungroup

# # A tibble: 12 x 3
#       id   val         y
#    <chr> <dbl>     <dbl>
#  1     A     1        NA
#  2     A     0 0.7071068
#  3     A     0 0.5773503
#  4     A     1 0.5773503
#  5     A     0 0.5477226
#  6     A     1 0.5477226
#  7     B     0        NA
#  8     B     0 0.0000000
#  9     B     0 0.0000000
# 10     B     1 0.5000000
# 11     B     1 0.5477226
# 12     B     1 0.5477226

Explanation

We first group like often with tidyverse chains, then we use mutate, and not summarize, as we want to keep the same unaggregated rows.

The function map_dbl is here used to loop on a vector of final indices. seq_along(val) will be 1:6 for both groups here.

Using functions from the map family we can use the ~ notation, which will assume the first parameter of the function is named .x.

Looping through these indices we compute first sd(val[1:1]) which is sd(val[1]) which is NA, then sd(val[1:2]) etc...

map_dbl returns by design a vector of doubles, and these are stacked in the y column.




回答2:


One can use zoo::rollapplyr with dynamic width (vector containing width). To prepare a dynamic width for each group 1:n() or seq(n()) can be used.

Let's apply it for function sd using data provided by OP :

library(dplyr)
library(zoo)

a %>% group_by(id) %>%
  mutate(y = rollapplyr(val, 1:n(), sd ))

#   # Groups: id [2]
#   id      val      y
#   <chr> <dbl>  <dbl>
#  1 A      1.00 NA    
#  2 A      0     0.707
#  3 A      0     0.577
#  4 A      1.00  0.577
#  5 A      0     0.548
#  6 A      1.00  0.548
#  7 B      0    NA    
#  8 B      0     0    
#  9 B      0     0    
# 10 B      1.00  0.500
# 11 B      1.00  0.548
# 12 B      1.00  0.548


来源:https://stackoverflow.com/questions/50599976/cumulative-aggregates-within-tidyverse

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!