I would like to calculate the rainfall that has fallen over the last three days for each grid square, and add this as a new column in my data.table. To be clear, I want to s
Late to the party, but a more recent version of data.table
package (1.12.8 for me) has frollsum
function that will accomplish this a bit more cleanly than earlier (but very much valid) answers:
library (data.table)
# making the data.table
rain <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10) # rainfall values to work with
square <- c(1,1,1,1,1,1,1,1,1,2) # the geographic grid square for the rainfall measurement
desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA ) # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square)
weather <- data.table(rain, square, desired_result) # making the data.table
# using `frollsum`
weather[, rain3 := frollsum(rain, n = 3), by = square][]
#> rain square desired_result rain3
#> 1: NA 1 NA NA
#> 2: NA 1 NA NA
#> 3: NA 1 NA NA
#> 4: 0 1 NA NA
#> 5: 0 1 NA NA
#> 6: 5 1 5 5
#> 7: 1 1 6 6
#> 8: 0 1 6 6
#> 9: 3 1 4 4
#> 10: 10 2 NA NA
Created on 2020-07-09 by the reprex package (v0.3.0)
Here's a quick and efficient solution using the latest data.table
version (v 1.9.6+)
weather[, rain_3 := Reduce(`+`, shift(rain, 0:2)), by = square]
weather
# rain square desired_result rain_3
# 1: NA 1 NA NA
# 2: NA 1 NA NA
# 3: NA 1 NA NA
# 4: 0 1 NA NA
# 5: 0 1 NA NA
# 6: 5 1 5 5
# 7: 1 1 6 6
# 8: 0 1 6 6
# 9: 3 1 4 4
# 10: 10 2 NA NA
The basic idea here is to shift
the rain
column twice and then sum up the rows.
The rollapply
solution would be done like this:
weather[, rain_3 := rollapplyr(rain, 3, sum, fill = NA_real_), by = square]
giving:
rain square desired_result rain_3
1: NA 1 NA NA
2: NA 1 NA NA
3: NA 1 NA NA
4: 0 1 NA NA
5: 0 1 NA NA
6: 5 1 5 5
7: 1 1 6 6
8: 0 1 6 6
9: 3 1 4 4
10: 10 2 NA NA
Have simplified based on version of zoo that came out since this question was originally asked.
A dplyr
solution:
library(dplyr)
weather %>%
group_by(square) %>%
mutate(rain_3 = rain + lag(rain) + lag(rain, n = 2L))
Result:
Source: local data table [10 x 4]
rain square desired_result rain_3
(dbl) (dbl) (dbl) (dbl)
1 NA 1 NA NA
2 NA 1 NA NA
3 NA 1 NA NA
4 0 1 NA NA
5 0 1 NA NA
6 5 1 5 5
7 1 1 6 6
8 0 1 6 6
9 3 1 4 4
10 10 2 NA NA
If you want to assign rain3 to your dataset, you can use the %<>%
symbol from maggritr
in your pipe:
library(magrittr)
weather %<>%
group_by......
weather[, rain_3 := filter(rain, rep(1, 3), sides = 1), by = list(square)]
#Error in filter(rain, rep(1, 3), sides = 1) :
# 'filter' is longer than time series
weather[, rain_3 := if(.N > 2) filter(rain, rep(1, 3), sides = 1) else NA_real_,
by = square]
# rain square desired_result rain_3
# 1: NA 1 NA NA
# 2: NA 1 NA NA
# 3: NA 1 NA NA
# 4: 0 1 NA NA
# 5: 0 1 NA NA
# 6: 5 1 5 5
# 7: 1 1 6 6
# 8: 0 1 6 6
# 9: 3 1 4 4
#10: 10 2 NA NA
Take care that dplyr is not loaded because it masks filter
. If you need dplyr, you can call stats::filter
explicitly.
You have almost got the answer yourself. rollsum
(or rollapply
in your case) gives you the vector of length N-2, so you just have to fill the desired cells with NAs. It can be simply done like this: roll<-c(NA,NA,rollsum(yourvector,k=3))
Here is how I do it. I am using roll_sum from {RcppRoll} package, because it is much faster and deals with NAs easier. Simple by
argument from data.table lets you group result by square.
library(RcppRoll)
weather[,rain_3:=if(.N>2){c(NA,NA,roll_sum(rain,n=3))}else{NA},by=square]
weather
rain square desired_result rain_3
1: NA 1 NA NA
2: NA 1 NA NA
3: NA 1 NA NA
4: 0 1 NA NA
5: 0 1 NA NA
6: 5 1 5 5
7: 1 1 6 6
8: 0 1 6 6
9: 3 1 4 4
10: 10 2 NA NA