问题
say I have a tibble
(or data.table
) which consists of two columns:
a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1))
Furthermore I have a function called myfun
which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun
as being the standard deviation.
Now I would like to create a third column to my tibble
(called result) which contains the outputs of myfun
applied to val cumulated and grouped with respect to id.
For example, the first entry of result should contain mfun(val[1])
.
The second entry should contain myfun(val[1:2])
, and so on.
I would like to implent a cumulated version of myfun.
Of course there a lot of easy solutions outside the tidyverse
using loops and what not.
But I would be interested in a solution within the tidyverse
or within the data.table
frame work.
Any help is appreciated.
回答1:
You could do it this way:
library(tidyverse)
a %>%
group_by(id) %>%
mutate(y = map_dbl(seq_along(val),~sd(val[1:.x]))) %>%
ungroup
# # A tibble: 12 x 3
# id val y
# <chr> <dbl> <dbl>
# 1 A 1 NA
# 2 A 0 0.7071068
# 3 A 0 0.5773503
# 4 A 1 0.5773503
# 5 A 0 0.5477226
# 6 A 1 0.5477226
# 7 B 0 NA
# 8 B 0 0.0000000
# 9 B 0 0.0000000
# 10 B 1 0.5000000
# 11 B 1 0.5477226
# 12 B 1 0.5477226
Explanation
We first group like often with tidyverse
chains, then we use mutate
, and not summarize
, as we want to keep the same unaggregated rows.
The function map_dbl
is here used to loop on a vector of final indices. seq_along(val)
will be 1:6
for both groups here.
Using functions from the map family we can use the ~
notation, which will assume the first parameter of the function is named .x
.
Looping through these indices we compute first sd(val[1:1])
which is sd(val[1])
which is NA
, then sd(val[1:2])
etc...
map_dbl
returns by design a vector of doubles
, and these are stacked in the y
column.
回答2:
One can use zoo::rollapplyr
with dynamic width (vector containing width
). To prepare a dynamic width for each group 1:n()
or seq(n())
can be used.
Let's apply it for function sd
using data provided by OP
:
library(dplyr)
library(zoo)
a %>% group_by(id) %>%
mutate(y = rollapplyr(val, 1:n(), sd ))
# # Groups: id [2]
# id val y
# <chr> <dbl> <dbl>
# 1 A 1.00 NA
# 2 A 0 0.707
# 3 A 0 0.577
# 4 A 1.00 0.577
# 5 A 0 0.548
# 6 A 1.00 0.548
# 7 B 0 NA
# 8 B 0 0
# 9 B 0 0
# 10 B 1.00 0.500
# 11 B 1.00 0.548
# 12 B 1.00 0.548
来源:https://stackoverflow.com/questions/50599976/cumulative-aggregates-within-tidyverse