Toy R code on Bayesian inference for mean of a normal distribution [data of snowfall amount]

问题

I have a number of snowfall observations:

x <- c(98.044, 107.696, 146.050, 102.870, 131.318, 170.434, 84.836, 154.686,
       162.814, 101.854, 103.378, 16.256)

and I was told that it follows normal distribution with known standard deviation at 25.4 but unknown mean mu. I have to make inference on mu using Bayesian Formula.

This is information on prior of mu

mean of snow |  50.8  | 76.2  | 101.6 | 127.0 |  152.4 | 177.8  
---------------------------------------------------------------
probability  |   0.1  | 0.15  | 0.25  |0.25   |  0.15  |  0.1 
---------------------------------------------------------------

The following is what I have tried so far, but the final line about post does not work correctly. The resulting plot justs give a horizontal line.

library(LearnBayes)
midpts <- c(seq(50.8, 177.8, 30))
prob <- c(0.1, 0.15, 0.25, 0.25, 0.15, 0.1)
p <- seq(50, 180, length = 40000)
histp <- histprior(p, midpts, prob)
plot(p, histp, type = "l")

# posterior density
post <- round(histp * dnorm(x, 115, 42) / sum(histp * dnorm(x, 115, 42)), 3)
plot(p, post, type = "l")

回答1:

My first suggestion is, make sure you understand the statistics behind this. When I saw your

post <- round(histp * dnorm(x, 115, 42) / sum(histp * dnorm(x, 115, 42)), 3)

I reckoned you have messed up several concepts. This appears to be Bayes Formula, but you have wrong code for the likelihood. The correct likelihood function is

## likelihood function: `L(obs | mu)`
## standard error is known (to make problem easy) at 25.4
Lik <- function (obs, mu) prod(dnorm(obs, mu, 25.4))

Note, mu is a unknown, so it should be a variable of this function; also, likelihood is the product of all individual probability density at observations. Now, we can evaluate likelihood for example, at mu = 100 by

Lik(x, 100)
# [1] 6.884842e-30

For successful R implementation, we need a vectorized version for function Lik. That is, a function that can evaluate on a vector input for mu, rather than just a scalar input. I will just use sapply for vectorization:

vecLik <- function (obs, mu) sapply(mu, Lik, obs = obs)

Let's try

vecLik(x, c(80, 90, 100))
# [1] 6.248416e-34 1.662366e-31 6.884842e-30

Now it is time to obtain prior distribution for mu. In principle this is a continuous function, but looks like we want a discrete approximation to it, using histprior from R package LearnBayes.

## prior distribution for `mu`: `prior(mu)`
midpts <- c(seq(50.8, 177.8, 30))
prob <- c(0.1, 0.15, 0.25, 0.25, 0.15, 0.1)
mu_grid <- seq(50, 180, length = 40000)  ## a grid of `mu` for discretization
library(LearnBayes)
prior_mu_grid <- histprior(mu_grid, midpts, prob)  ## discrete prior density
plot(mu_grid, prior_mu_grid, type = "l")

Before applying Baye's Formula, we first work out the normalizing constant NC on the denominator. This would be an integral of Lik(obs | mu) * prior(mu). But as we have discrete approximation for prior(mu), we use Riemann sum to approximate this integral.

delta <- mu_grid[2] - mu_grid[1]    ## division size
NC <- sum(vecLik(x, mu_grid) * prior_mu_grid * delta)    ## Riemann sum
# [1] 2.573673e-28

Great, all being ready, we can use Bayes Formula:

posterior(mu | obs) = Lik(obs | mu) * prior(mu) / NC

Again, as prior(mu) is discretized, posterior(mu) is discretized, too.

post_mu <- vecLik(x, mu_grid) * prior_mu_grid / NC

Haha, let's sketch posterior of mu to see the inference result:

plot(mu_grid, post_mu, type = "l")

Wow, this is beautiful!!

来源：https://stackoverflow.com/questions/40189329/toy-r-code-on-bayesian-inference-for-mean-of-a-normal-distribution-data-of-snow

标签

plot

statistics

bayesian

normal-distribution