i am implementing a rolling sum calculation through dplyr, but in my database i have a number of variables that have only one or only a few observations, causing an (k is sm
library(dplyr)
dg %>%
arrange(site,year,animal) %>%
group_by(site,animal) %>%
mutate(rollsum=cumsum(count))
You can instead use RcppRoll::roll_sum
which returns NA if the sample size(n
) is less than the window size(k
).
set.seed(1)
dg$count = rpois(dim(dg)[1], 5)
library(RcppRoll)
library(dplyr)
dg %>%
arrange(site,year,animal) %>%
group_by(site, animal) %>%
mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA))
# site year animal count roll_sum
#1 Boston 2000 dog 4 NA
#2 Boston 2001 dog 5 9
#3 Boston 2002 dog 3 8
#4 Boston 2003 dog 9 12
#5 Boston 2004 dog 6 15
#6 New York 2000 dog 4 NA
#7 New York 2001 dog 8 12
#8 New York 2002 dog 8 16
#9 New York 2003 dog 6 14
#10 New York 2004 cat 2 NA
roll_Sum from RcppRoll will return an NA in place of an error wherever the number of data points are the less than the window size.
However, in case you want to return the sum of the number of data points present - even if lesser than the window the size, you can use the rollapplyr function from zoo.
library(zoo)
library(dplyr)
dg %>%
arrange(site,year,animal) %>%
group_by(site, animal) %>%
mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA)) %>%
mutate(rollapply_sum =rollapplyr(count, 2, sum, partial = TRUE) )
Rollapply_sum will return the original value or the sum of data points present, even if its less than the window size instead of returning NA.
site year animal count roll_sum rollapply_sum
(fctr) (int) (fctr) (int) (dbl) (int)
1 Boston 2000 dog 4 NA 4
2 Boston 2001 dog 5 9 9
3 Boston 2002 dog 3 8 8
4 Boston 2003 dog 9 12 12
5 Boston 2004 dog 6 15 15
6 New York 2000 dog 4 NA 4
7 New York 2001 dog 8 12 12
8 New York 2002 dog 8 16 16
9 New York 2003 dog 6 14 14
10 New York 2004 cat 2 NA 2