Fit a bayesian linear regression and predict unobservable values

前端 未结 1 1005
醉梦人生
醉梦人生 2021-01-21 21:01

I\'d like to use Jags plus R to adjust a linear model with observable quantities, and make inference about unobservable ones. I found lots of example on the internet about how t

1条回答
  •  南方客
    南方客 (楼主)
    2021-01-21 21:42

    JAGS has powerful ways to make inference about missing data, and once you get the hang of it, it's easy! I strongly recommend that you check out Marc Kéry's excellent book which provides a wonderful introduction to BUGS language programming (JAGS is close enough to BUGS that almost everything transfers).

    The easiest way to do this involves, as you say, adjusting the model. Below I provide a complete worked example of how this works. But you seem to be asking for a way to get the prediction interval without re-running the model (is your model very large and computationally expensive?). This can also be done.
    How to predict--the hard way (without re-running the model) For each iteration of the MCMC, simulate the response for the desired x-value based on that iteration's posterior draws for the covariate values. So imagine you want to predict a value for X=10. Then if iteration 1 (post burn-in) has slope=2, intercept=1, and standard deviation=0.5, draw a Y-value from

    Y=rnorm(1, 1+2*10, 0.5)  
    

    And repeat for iteration 2, 3, 4, 5... These will be your posterior draws for the response at X=10. Note: if you did not monitor the standard deviation in your JAGS model, you are out of luck and need to fit the model again.

    How to predict--the easy way--with worked example The basic idea is to insert (into your data) the x-values whose responses you want to predict, with the associated y-values NA. For example, if you want a prediction interval for X=10, you just have to include the point (10, NA) in your data, and set a trace monitor for the y-value.

    I use JAGS from R with the rjags package. Below is a complete worked example that begins by simulating the data, then adds some extra x-values to the data, specifies and runs the linear model in JAGS via rjags, and summarizes the results. Y[101:105] contains draws from the posterior prediction intervals for X[101:105]. Notice that Y[1:100] just contains the y-values for X[1:100]. These are the observed data that we fed to the model, and they never change as the model updates.

    library(rjags)
    # Simulate data (100 observations)
    my.data <- as.data.frame(matrix(data=NA, nrow=100, ncol=2))
    names(my.data) <- c("X", "Y")
    # the linear model will predict Y based on the covariate X
    
    my.data$X <- runif(100) # values for the covariate
    int <- 2     # specify the true intercept
    slope <- 1   # specify the true slope
    sigma <- .5   # specify the true residual standard deviation
    my.data$Y <- rnorm(100, slope*my.data$X+int, sigma)  # Simulate the data
    
    #### Extra data for prediction of unknown Y-values from known X-values
    y.predict <- as.data.frame(matrix(data=NA, nrow=5, ncol=2))
    names(y.predict) <- c("X", "Y")
    y.predict$X <- c(-1, 0, 1.3, 2, 7)
    
    mydata <- rbind(my.data, y.predict)
    
    
    set.seed(333)
    setwd(INSERT YOUR WORKING DIRECTORY HERE)
    sink("mymodel.txt")
    cat("model{
    
        # Priors
    
        int ~ dnorm(0, .001)
        slope ~ dnorm(0, .001)
        tau <- 1/(sigma * sigma)
        sigma ~ dunif(0,10) 
    
        # Model structure
    
        for(i in 1:R){
        Y[i] ~ dnorm(m[i],tau)
        m[i] <- int + slope * X[i]
        }
        }", fill=TRUE)
    sink()
    jags.data <- list(R=dim(mydata)[1], X=mydata$X, Y=mydata$Y)
    
    inits <- function(){list(int=rnorm(1, 0, 5), slope=rnorm(1,0,5),
                             sigma=runif(1,0,10))}
    
    params <- c("Y", "int", "slope", "sigma")
    
    nc <- 3
    n.adapt <-1000
    n.burn <- 1000
    n.iter <- 10000
    thin <- 10
    my.model <- jags.model('mymodel.txt', data = jags.data, inits=inits, n.chains=nc, n.adapt=n.adapt)
    update(my.model, n.burn)
    my.model_samples <- coda.samples(my.model,params,n.iter=n.iter, thin=thin)
    summary(my.model_samples)
    

    0 讨论(0)
提交回复
热议问题