Say I have a data frame in R as follows:
> set.seed(1)
> X <- runif(50, 0, 1)
> Y <- runif(50, 0, 1)
> df <- data.frame(X,Y)
> head(df)
You may be interested in the biglm
function in the biglm package. This allows you to fit a regression on a subset of the data, then update the regression model with additional data. The original idea was to use this for large datasets so that you only need part of the data in memory at any given time, but it fits the description of what you want to do perfectly (you can wrap the updating process in a loop). The summary for biglm
objects gives confidence intervals in addition to standard errors (and coefficients of course).
library(biglm)
fit1 <- biglm( Sepal.Width ~ Sepal.Length + Species, data=iris[1:20,])
summary(fit1)
out <- list()
out[[1]] <- fit1
for(i in 1:130) {
out[[i+1]] <- update(out[[i]], iris[i+20,])
}
out2 <- lapply(out, function(x) summary(x)$mat)
out3 <- sapply(out2, function(x) x[2,2:3])
matplot(t(out3), type='l')
If you don't want to use an explicit loop, then the Reduce function can help:
fit1 <- biglm( Sepal.Width ~ Sepal.Length + Species, data=iris[1:20,])
iris.split <- split(iris, c(rep(NA,20),1:130))
out4 <- Reduce(update, iris.split, init=fit1, accumulate=TRUE)
out5 <- lapply(out4, function(x) summary(x)$mat)
out6 <- sapply(out5, function(x) x[2,2:3])
all.equal(out3,out6)