calculating sum of previous 3 rows in R data.table (by grid-square)

前端 未结 6 1420
一生所求
一生所求 2020-12-03 23:32

I would like to calculate the rainfall that has fallen over the last three days for each grid square, and add this as a new column in my data.table. To be clear, I want to s

相关标签:
6条回答
  • 2020-12-04 00:08

    Late to the party, but a more recent version of data.table package (1.12.8 for me) has frollsum function that will accomplish this a bit more cleanly than earlier (but very much valid) answers:

    library (data.table)
    
    # making the data.table
    rain           <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10)  # rainfall values to work with
    square         <- c(1,1,1,1,1,1,1,1,1,2)               # the geographic grid square for the rainfall measurement
    desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA )  # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square)
    weather <- data.table(rain, square, desired_result)  # making the data.table
    
    # using `frollsum`
    weather[, rain3 := frollsum(rain, n = 3), by = square][]
    #>     rain square desired_result rain3
    #>  1:   NA      1             NA    NA
    #>  2:   NA      1             NA    NA
    #>  3:   NA      1             NA    NA
    #>  4:    0      1             NA    NA
    #>  5:    0      1             NA    NA
    #>  6:    5      1              5     5
    #>  7:    1      1              6     6
    #>  8:    0      1              6     6
    #>  9:    3      1              4     4
    #> 10:   10      2             NA    NA
    

    Created on 2020-07-09 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2020-12-04 00:11

    Here's a quick and efficient solution using the latest data.table version (v 1.9.6+)

    weather[, rain_3 := Reduce(`+`, shift(rain, 0:2)), by = square]
    weather
    #     rain square desired_result rain_3
    #  1:   NA      1             NA     NA
    #  2:   NA      1             NA     NA
    #  3:   NA      1             NA     NA
    #  4:    0      1             NA     NA
    #  5:    0      1             NA     NA
    #  6:    5      1              5      5
    #  7:    1      1              6      6
    #  8:    0      1              6      6
    #  9:    3      1              4      4
    # 10:   10      2             NA     NA
    

    The basic idea here is to shift the rain column twice and then sum up the rows.

    0 讨论(0)
  • 2020-12-04 00:19

    The rollapply solution would be done like this:

    weather[, rain_3 := rollapplyr(rain, 3, sum, fill = NA_real_), by = square]
    

    giving:

        rain square desired_result rain_3
     1:   NA      1             NA     NA
     2:   NA      1             NA     NA
     3:   NA      1             NA     NA
     4:    0      1             NA     NA
     5:    0      1             NA     NA
     6:    5      1              5      5
     7:    1      1              6      6
     8:    0      1              6      6
     9:    3      1              4      4
    10:   10      2             NA     NA
    

    Update

    Have simplified based on version of zoo that came out since this question was originally asked.

    0 讨论(0)
  • 2020-12-04 00:19

    A dplyr solution:

    library(dplyr)
    weather %>% 
      group_by(square) %>% 
      mutate(rain_3 = rain + lag(rain) + lag(rain, n = 2L))
    

    Result:

    Source: local data table [10 x 4]
    
        rain square desired_result rain_3
       (dbl)  (dbl)          (dbl) (dbl)
    1     NA      1             NA    NA
    2     NA      1             NA    NA
    3     NA      1             NA    NA
    4      0      1             NA    NA
    5      0      1             NA    NA
    6      5      1              5     5
    7      1      1              6     6
    8      0      1              6     6
    9      3      1              4     4
    10    10      2             NA    NA
    

    If you want to assign rain3 to your dataset, you can use the %<>% symbol from maggritr in your pipe:

    library(magrittr)
    weather %<>%
      group_by......
    
    0 讨论(0)
  • 2020-12-04 00:28
    weather[, rain_3 := filter(rain, rep(1, 3), sides = 1), by = list(square)]  
    #Error in filter(rain, rep(1, 3), sides = 1) : 
    #  'filter' is longer than time series
    weather[, rain_3 := if(.N > 2) filter(rain, rep(1, 3), sides = 1) else NA_real_, 
            by = square] 
    #    rain square desired_result rain_3
    # 1:   NA      1             NA     NA
    # 2:   NA      1             NA     NA
    # 3:   NA      1             NA     NA
    # 4:    0      1             NA     NA
    # 5:    0      1             NA     NA
    # 6:    5      1              5      5
    # 7:    1      1              6      6
    # 8:    0      1              6      6
    # 9:    3      1              4      4
    #10:   10      2             NA     NA
    

    Take care that dplyr is not loaded because it masks filter. If you need dplyr, you can call stats::filter explicitly.

    0 讨论(0)
  • 2020-12-04 00:33

    You have almost got the answer yourself. rollsum (or rollapply in your case) gives you the vector of length N-2, so you just have to fill the desired cells with NAs. It can be simply done like this: roll<-c(NA,NA,rollsum(yourvector,k=3))

    Here is how I do it. I am using roll_sum from {RcppRoll} package, because it is much faster and deals with NAs easier. Simple by argument from data.table lets you group result by square.

    library(RcppRoll)
    weather[,rain_3:=if(.N>2){c(NA,NA,roll_sum(rain,n=3))}else{NA},by=square]
    weather
    
        rain square desired_result rain_3
     1:   NA      1             NA     NA
     2:   NA      1             NA     NA
     3:   NA      1             NA     NA
     4:    0      1             NA     NA
     5:    0      1             NA     NA
     6:    5      1              5      5
     7:    1      1              6      6
     8:    0      1              6      6
     9:    3      1              4      4
    10:   10      2             NA     NA
    
    0 讨论(0)
提交回复
热议问题