Rolling Mean/standard deviation with conditions

后端未结

关注

 3  720

I have a bit of a question about computing the Rolling Mean/standard deviation based on conditions. To be honest it is more of a syntax question, but since I think it is slo

相关标签:

3条回答

别那么骄傲

2021-01-07 04:48

There now also is a rolling mean function within data.table itself, please see github disscussion for details. The implementation is really straightforward.

DT[, rollmean := data.table::frollmean(x, n = 3, fill = 0, align = "right"), 
by = .(stock)]

A quick benchmarking of the two, shows that the data.table version is a bit quicker (most of the time).

library(microbenchmark)

microbenchmark(a = DT[, rollmean := data.table::frollmean(x, n = 3, fill = 0, align = "right"), 
                      by = .(stock)]
               , b = DT[, rollmean := rollmean(x, k = 3, fill = 0, align = "right"),
                            by = .(stock)]
, times = 100L

)

Unit: milliseconds
expr    min      lq     mean  median     uq     max neval cld
   a 1.5695 1.66605 2.329675 1.79340 2.1980 39.3750   100  a 
   b 2.6711 2.82105 3.660617 2.99725 4.3577 20.3178   100   b

0 讨论(0)

傲寒

2021-01-07 04:59

I think your problem is your use of the := function and that you use DT inside the square brackets. I assume your setup is something like:

> library(data.table)
> set.seed(83385668)
> DT <- data.table(
+   x     = rnorm(5 * 3), 
+   stock = c(sapply(letters[1:3], rep, times = 5)),
+   time  = c(replicate(3, 1:5)))
> DT
              x stock time
 1:  0.25073356     a    1
 2: -0.24408170     a    2
 3: -0.87475856     a    3
 4:  0.50843761     a    4
 5: -1.91331773     a    5
 6:  0.07850094     b    1
 7: -0.15922989     b    2
 8:  1.09806870     b    3
 9:  0.27995610     b    4
10:  0.45090842     b    5
11:  0.03400554     c    1
12: -0.34918734     c    2
13:  2.16602740     c    3
14: -0.04758261     c    4
15:  1.24869663     c    5

I am not sure where the roll_sd function is from. However, you can compute e.g. a rolling mean with the zoo library as follows:

> library(zoo)
> setkey(DT, stock, time) # make sure data is sorted by time
> DT[, rollmean := rollmean(x, k = 3, fill = 0, align = "right"), 
+    by = .(stock)]
> DT
              x stock time   rollmean
 1:  0.25073356     a    1  0.0000000
 2: -0.24408170     a    2  0.0000000
 3: -0.87475856     a    3 -0.2893689
 4:  0.50843761     a    4 -0.2034676
 5: -1.91331773     a    5 -0.7598796
 6:  0.07850094     b    1  0.0000000
 7: -0.15922989     b    2  0.0000000
 8:  1.09806870     b    3  0.3391132
 9:  0.27995610     b    4  0.4062650
10:  0.45090842     b    5  0.6096444
11:  0.03400554     c    1  0.0000000
12: -0.34918734     c    2  0.0000000
13:  2.16602740     c    3  0.6169485
14: -0.04758261     c    4  0.5897525
15:  1.24869663     c    5  1.1223805

or equivalently

> DT[, `:=`(rollmean = rollmean(x, k = 3, fill = 0, align = "right")), 
+    by = .(stock)]
> DT
              x stock time   rollmean
 1:  0.25073356     a    1  0.0000000
 2: -0.24408170     a    2  0.0000000
 3: -0.87475856     a    3 -0.2893689
 4:  0.50843761     a    4 -0.2034676
 5: -1.91331773     a    5 -0.7598796
 6:  0.07850094     b    1  0.0000000
 7: -0.15922989     b    2  0.0000000
 8:  1.09806870     b    3  0.3391132
 9:  0.27995610     b    4  0.4062650
10:  0.45090842     b    5  0.6096444
11:  0.03400554     c    1  0.0000000
12: -0.34918734     c    2  0.0000000
13:  2.16602740     c    3  0.6169485
14: -0.04758261     c    4  0.5897525
15:  1.24869663     c    5  1.1223805

0 讨论(0)

臣服心动

2021-01-07 05:13
I met the same problem calculating rolling standard in my data-processing process.So I viewed this site. And I think your problem is using DT$Midquotes not .SD$Midquotes. .SD is a data.table containing the Subset of x’s Data for each group. And roll_sd function is from package"RcppRoll". You can try this way.
```
DT[, (sd = roll_sd(.SD$Midquotes, 20, fill=0, align = "right")), by = .(Stock)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...