问题
y <- cumsum(rnorm(100,0,1)) # random normal, with small (1.0) drift.
y.ts <- ts(y)
x <- cumsum(rnorm(100,0,1))
x
x.ts <- ts(x)
ts.plot(y.ts,ty= "l", x.ts) # plot the two random walks
Regression.Q1 = lm(y~x) ; summary(lm2)
summary(Regression.Q1)
t.test1 <- (summary(Regression.Q1)$coef[2,3]) # T-test computation
y[t] = y[t-1] + epsilon[t]
epsilon[t] ~ N(0,1)
set.seed(1)
t=1000
epsilon=sample(c(-1,1), t, replace = 1) # Generate k random walks across time {0, 1, ... , T}
N=T=1e3
y=t(apply(matrix(sample(c(-1,1),N*T,rep=TRUE),ncol=T),1,cumsum))
y[1]<-0
for (i in 2:t) {
y[i]<-y[i-1]+epsilon[i]
}
I need to:
Repeat the process 1,000 times (Monte Carlo simulations), namely build a loop around the previous program and each time save the t statistics. You will have a sequence of 1;000 t-tests : S = (t-test1, t-test2, ... , t-test1000). Count the number of time the absolute value of the 1,000 t-tests > 1.96, the critical value at a 5% significance level. If the series were I(0) you would have found roughly 5%. It won't be the case here (spurious regression).
What do I need to add to save the respective coefficients ?
回答1:
Your posted code related to y[t] = y[t-1] + epsilon[t]
is not real working code, but I can see that you are trying to store all 1000 * 2 random walk. Actually there is no need to do this. We only care about t-score rather than what those realizations of random walk are.
For this kind of problem, where we aim to replicate a procedure a lot of times, it is handy to first write a function to execute such a procedure for a single time. You already had good working code for this; we just need to wrap it in a function (removing those unnecessary part like plot
):
sim <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
coef(summary(lm(y ~ x)))[2,3]
}
This function takes no input; it only returns the t-score for one experiment.
Now, we are going to repeat this 1000 times. We can write a for
loop, but function replicate
is easier (read ?replicate
if necessary)
S <- replicate(1000, sim())
Note this will take some time, much slower than it should be for such a simple task, because both lm
and summary.lm
are slow. A much faster way will be shown later.
Now, S
is vector with 1000 values, which is the "a sequence of 1000 t-tests" you want. To get "the number of time the absolute value of the 1,000 t-tests > 1.96", we can just use
sum(abs(S) > 1.96)
# [1] 756
The result 756 is just what I get; you will get something different as the simulation is random. But it will always be quite a large number as expected.
A faster version of sim
:
fast_sim <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
y <- y - mean(y)
x <- x - mean(x)
xty <- crossprod(x,y)[1]
xtx <- crossprod(x)[1]
b <- xty / xtx
sigma <- sqrt(sum((y - x * b) ^ 2) / 98)
b * sqrt(xtx) * sigma
}
This function computes simple linear regression without lm
, and t-score without summary.lm
.
S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 778
An alternative way is to use cor.test
:
fast_sim2 <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
unname(cor.test(x, y)[[1]])
}
S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 775
Let's have a benchmark:
system.time(replicate(1000, sim()))
# user system elapsed
# 1.860 0.004 1.867
system.time(replicate(1000, fast_sim()))
# user system elapsed
# 0.088 0.000 0.090
system.time(replicate(1000, fast_sim2()))
# user system elapsed
# 0.308 0.004 0.312
cor.test
is much faster than lm
+ summary.lm
, but manual computation is even faster!
来源:https://stackoverflow.com/questions/39944139/monte-carlo-simulation-of-correlation-between-two-brownian-motion-continuous-ra