问题
My imported data contains 7 variables: Y
and X1
, X2
, X3
, X4
, X5
, X6
. I tried applying the rollapply
function in zoo
in order to run a rolling regression within an in-sample with a window of 262 obs. (work days in a year).
date Y X1 X2
1 10/1/07 -0.0080321720 4.690734e-03 3.333770e-03
2 10/2/07 0.0000000000 -2.818413e-03 5.418223e-03
3 10/3/07 0.0023158650 -4.178744e-03 -3.821100e-04
4 10/4/07 -0.0057491710 -5.071030e-03 -8.321550e-04
5 10/5/07 0.0073570500 3.065045e-03 5.179574e-03
6 10/8/07 0.0127708010 -7.278513e-03 1.145395e-03
7 10/9/07 0.0032661980 9.692267e-03 6.514035e-03
8 10/10/07 0.0013824430 1.161780e-04 2.676416e-03
9 10/11/07 0.0026607550 1.113179e-02 8.825719e-03
10 10/12/07 -0.0046362600 -2.453561e-03 -6.584070e-03
11 10/15/07 -0.0023757680 -7.829081e-03 -3.070540e-03
12 10/16/07 -0.0128673660 -4.619378e-03 -8.972126e-03
13 10/17/07 0.0016049760 1.276695e-03 5.349316e-03
14 10/18/07 -0.0044198970 -9.018499e-03 -1.215895e-02
15 10/19/07 -0.0011080330 -5.328661e-03 -7.131916e-03
16 10/22/07 -0.0024217970 -2.019539e-02 -2.021072e-02
17 10/23/07 0.0031270520 1.668604e-02 2.236130e-02
18 10/24/07 -0.0040367400 -1.061433e-02 -5.735703e-03
19 10/25/07 0.0001011170 1.346312e-02 1.036109e-02
20 10/26/07 0.0003032910 3.766526e-03 2.903628e-03
21 10/29/07 0.0004042450 1.416406e-02 2.527754e-03
22 10/30/07 -0.0012132240 -1.387166e-03 -8.202236e-03
23 10/31/07 0.0057497510 9.593904e-03 1.433401e-02
24 11/1/07 -0.0032238590 -1.648975e-02 -1.029199e-02
25 11/2/07 -0.0031330560 -7.737784e-03 -7.559498e-03
26 11/5/07 -0.0001012300 -7.877763e-03 -8.500554e-03
27 11/6/07 -0.0004050220 7.407770e-03 2.536320e-03
28 11/7/07 -0.0031444970 -5.904219e-03 -8.026064e-03
29 11/8/07 -0.0045822590 -3.712574e-03 -6.395584e-03
30 11/9/07 0.0016316540 -1.432552e-02 -1.741458e-02
31 11/12/07 -0.0019378860 -3.926583e-03 -4.543370e-03
32 11/13/07 0.0011223920 -1.952799e-03 -2.622112e-03
33 11/14/07 0.0008154940 8.687550e-06 1.085682e-03
34 11/15/07 0.0015272620 -1.549745e-02 -1.556172e-02
35 11/16/07 -0.0001017450 -5.578556e-03 -1.432244e-02
36 11/19/07 0.0014234880 -2.206707e-02 -3.537936e-02
37 11/20/07 -0.0010165700 1.643937e-02 5.140822e-03
38 11/21/07 -0.0008140010 -1.715961e-02 -2.756704e-02
39 11/22/07 -0.0008146640 -2.108098e-03 7.455698e-03
40 11/23/07 0.0008146640 1.266776e-02 1.615338e-02
41 11/26/07 0.0008140010 5.539814e-03 2.854080e-03
42 11/27/07 0.0006100660 -8.561106e-03 -9.720505e-03
43 11/28/07 -0.0015258640 3.392103e-02 2.132374e-02
44 11/29/07 -0.0006109980 6.109848e-03 1.045556e-02
45 11/30/07 0.0004073730 9.214342e-03 1.133690e-02
46 12/3/07 -0.0002036660 -7.006415e-03 -6.079820e-04
47 12/4/07 0.0002036660 -1.187605e-02 -2.554853e-02
48 12/5/07 0.0007125040 1.362121e-02 9.525618e-03
49 12/6/07 -0.0034655010 7.917348e-03 5.252105e-03
50 12/7/07 0.0018361730 -1.026832e-02 1.216898e-02
51 12/10/07 0.0013240310 3.347302e-03 1.143687e-02
52 12/11/07 0.0005087760 -3.433720e-03 2.373558e-03
53 12/12/07 0.0024385300 5.507930e-04 3.191504e-03
54 12/13/07 -0.0115336820 -1.793698e-02 -2.149447e-02
55 12/14/07 -0.0010271160 -2.307745e-03 -1.038483e-03
56 12/17/07 -0.0033969870 -1.822079e-02 -2.920662e-02
57 12/18/07 0.0000000000 -1.873297e-03 -7.061215e-03
58 12/19/07 -0.0004125410 -3.372400e-06 -7.879850e-03
59 12/20/07 0.0008249120 -6.227957e-03 -1.752460e-04
60 12/21/07 -0.0020635580 1.734991e-02 1.348190e-02
61 12/24/07 0.0003098050 0.000000e+00 0.000000e+00
62 12/25/07 0.0000000000 0.000000e+00 0.000000e+00
63 12/26/07 0.0001032470 0.000000e+00 0.000000e+00
64 12/27/07 0.0006192590 5.006783e-03 5.274480e-03
65 12/28/07 -0.0005160230 6.428153e-03 8.557260e-03
66 12/31/07 0.0000000000 0.000000e+00 0.000000e+00
67 1/1/08 0.0002064410 0.000000e+00 0.000000e+00
68 1/2/08 -0.0009293200 -6.023384e-03 -3.104400e-03
69 1/3/08 0.0027853730 -2.302511e-03 -2.759650e-03
70 1/4/08 0.0018526150 -2.149450e-02 -2.645257e-02
71 1/7/08 -0.0005142710 -4.445206e-03 -2.117698e-
1596 <NA> <NA> <NA>
the last line for some reason doesn't show the values,even though there are in the original excel file(X3,X4,X5,X6 are missing,since the columns are stacked on top of each other,I copied the batch from the top for the example.
My code is:
rollapply(ts, 262, lm(
Y~X1+X2+X3+X4+X5+X6+0, subset=1:floor(length(x)/2)),
align="right")
The error message I get is:
Error in eval(expr, envir, enclos) : object 'Y' not found
I really wonder why it can not find the Y
variable, since it is displayed in the time series dataset with the appropriate heading.
回答1:
It is not really clear what your data actually is (use dput(example_data)
to give reproducible examples).
But the lm
call in your example is simply doing the same regression over and over again (your x is not changing) and as josilber points out, it is supposed to be a function. Here is an example where all the data is in the data.frame allRegData
and it has at least two columns, one named y
and another named x
:
require(zoo)
rollapply(zoo(allRegData),
width=262,
FUN = function(Z)
{
t = lm(formula=y~x, data = as.data.frame(Z), na.rm=T);
return(t$coef)
},
by.column=FALSE, align="right")
回答2:
I guess your questions are
- that you want to apply rolling regression on 262 width window of data for roughly 6 years yielding 1572 which is close to your 1596 observations with six covariates.
- figure out how to solve your problem with
rollapply
. - an issue with loading in a data set from Excel.
It seems hard to help you with 3. since you do not provide the data set or the R code you use. Starting with 1., then you can use the rollRegres
package I have made. What you would do is something like this
#####
# simulate data
width <- 262L
n_yr <- 6L
n_covs <- 6L
set.seed(23847996)
X <- matrix(rnorm(n_yr * width), ncol = n_covs, dimnames = list(NULL, paste0("X", 1:n_covs)))
df <- data.frame(Y = rnorm(n_yr * width), X)
head(df)
#R Y X1 X2 X3 X4 X5 X6
#R 1 -1.1478 0.516 -1.381 0.776 -0.0992 1.254 -0.8444
#R 2 -0.0542 1.328 1.411 -1.206 0.2560 0.975 0.9534
#R 3 -0.8350 -1.402 1.190 0.591 -1.5928 0.330 1.0806
#R 4 -0.5902 0.937 -0.182 0.193 0.1575 0.217 -0.2613
#R 5 1.7891 -0.608 -1.090 0.180 -1.1765 -0.992 -0.8831
#R 6 2.0108 0.259 -0.129 0.261 1.6694 -1.822 0.0616
#####
# estimate coefs
library(rollRegres)
fits <- roll_regres(Y ~ X1 + X2 + X3 + X4 + X5 + X6, df, width = width)
tail(fits$coefs) # estimated coefs
#R (Intercept) X1 X2 X3 X4 X5 X6
#R 1567 0.006511 0.0557 -0.0243 0.00907 0.0234 -0.0370 -0.0553
#R 1568 0.005816 0.0569 -0.0233 0.00798 0.0248 -0.0370 -0.0560
#R 1569 0.006406 0.0566 -0.0231 0.00814 0.0243 -0.0385 -0.0555
#R 1570 0.004898 0.0519 -0.0213 0.00773 0.0323 -0.0391 -0.0588
#R 1571 0.002922 0.0532 -0.0211 0.00809 0.0307 -0.0377 -0.0609
#R 1572 0.000771 0.0538 -0.0262 0.00580 0.0309 -0.0363 -0.0658
Now, regarding 2. then you can do something like what Hans Roggeman shows but a version that works with multiple regression as you request
library(zoo)
c2 <- rollapply(
df, width = width, function(z){
coef(lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6, as.data.frame(z)))
}, by.column = FALSE, fill = NA_real_, align = "right")
all.equal(fits$coefs, c2, check.attributes = FALSE) # gives the same
#R [1] TRUE
It is much slower though
microbenchmark::microbenchmark(
rollRegrs = roll_regres(Y ~ X1 + X2 + X3 + X4 + X5 + X6, df, width = width),
rollapply = rollapply(
df, width = width, function(z){
coef(lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6, as.data.frame(z)))
}, by.column = FALSE, fill = NA_real_, align = "right"), times = 25)
#R Unit: milliseconds
#R expr min lq mean median uq max neval
#R rollRegrs 1.73 1.98 2.31 2.37 2.64 3.01 25
#R rollapply 1726.74 1798.37 1881.25 1834.00 1964.62 2178.58 25
The rollapply
version can be faster if you use lm.fit
but it it still slower than roll_regres
.
来源:https://stackoverflow.com/questions/23162937/the-rolling-regression-in-r-using-roll-apply