I have a bit of a question about computing the Rolling Mean/standard deviation based on conditions. To be honest it is more of a syntax question, but since I think it is slo
There now also is a rolling mean function within data.table itself, please see github disscussion for details. The implementation is really straightforward.
DT[, rollmean := data.table::frollmean(x, n = 3, fill = 0, align = "right"),
by = .(stock)]
A quick benchmarking of the two, shows that the data.table
version is a bit quicker (most of the time).
library(microbenchmark)
microbenchmark(a = DT[, rollmean := data.table::frollmean(x, n = 3, fill = 0, align = "right"),
by = .(stock)]
, b = DT[, rollmean := rollmean(x, k = 3, fill = 0, align = "right"),
by = .(stock)]
, times = 100L
)
Unit: milliseconds
expr min lq mean median uq max neval cld
a 1.5695 1.66605 2.329675 1.79340 2.1980 39.3750 100 a
b 2.6711 2.82105 3.660617 2.99725 4.3577 20.3178 100 b
I think your problem is your use of the :=
function and that you use DT
inside the square brackets. I assume your setup is something like:
> library(data.table)
> set.seed(83385668)
> DT <- data.table(
+ x = rnorm(5 * 3),
+ stock = c(sapply(letters[1:3], rep, times = 5)),
+ time = c(replicate(3, 1:5)))
> DT
x stock time
1: 0.25073356 a 1
2: -0.24408170 a 2
3: -0.87475856 a 3
4: 0.50843761 a 4
5: -1.91331773 a 5
6: 0.07850094 b 1
7: -0.15922989 b 2
8: 1.09806870 b 3
9: 0.27995610 b 4
10: 0.45090842 b 5
11: 0.03400554 c 1
12: -0.34918734 c 2
13: 2.16602740 c 3
14: -0.04758261 c 4
15: 1.24869663 c 5
I am not sure where the roll_sd
function is from. However, you can compute e.g. a rolling mean with the zoo
library as follows:
> library(zoo)
> setkey(DT, stock, time) # make sure data is sorted by time
> DT[, rollmean := rollmean(x, k = 3, fill = 0, align = "right"),
+ by = .(stock)]
> DT
x stock time rollmean
1: 0.25073356 a 1 0.0000000
2: -0.24408170 a 2 0.0000000
3: -0.87475856 a 3 -0.2893689
4: 0.50843761 a 4 -0.2034676
5: -1.91331773 a 5 -0.7598796
6: 0.07850094 b 1 0.0000000
7: -0.15922989 b 2 0.0000000
8: 1.09806870 b 3 0.3391132
9: 0.27995610 b 4 0.4062650
10: 0.45090842 b 5 0.6096444
11: 0.03400554 c 1 0.0000000
12: -0.34918734 c 2 0.0000000
13: 2.16602740 c 3 0.6169485
14: -0.04758261 c 4 0.5897525
15: 1.24869663 c 5 1.1223805
or equivalently
> DT[, `:=`(rollmean = rollmean(x, k = 3, fill = 0, align = "right")),
+ by = .(stock)]
> DT
x stock time rollmean
1: 0.25073356 a 1 0.0000000
2: -0.24408170 a 2 0.0000000
3: -0.87475856 a 3 -0.2893689
4: 0.50843761 a 4 -0.2034676
5: -1.91331773 a 5 -0.7598796
6: 0.07850094 b 1 0.0000000
7: -0.15922989 b 2 0.0000000
8: 1.09806870 b 3 0.3391132
9: 0.27995610 b 4 0.4062650
10: 0.45090842 b 5 0.6096444
11: 0.03400554 c 1 0.0000000
12: -0.34918734 c 2 0.0000000
13: 2.16602740 c 3 0.6169485
14: -0.04758261 c 4 0.5897525
15: 1.24869663 c 5 1.1223805
I met the same problem calculating rolling standard in my data-processing process.So I viewed this site. And I think your problem is using DT$Midquotes not .SD$Midquotes. .SD is a data.table containing the Subset of x’s Data for each group. And roll_sd function is from package"RcppRoll". You can try this way.
DT[, (sd = roll_sd(.SD$Midquotes, 20, fill=0, align = "right")), by = .(Stock)]