I try to understand the differences between the two methods bayes
and mle
in the bn.fit
function of the package bnlearn
.<
Bayesian parameter estimation in bnlearn::bn.fit
applies to discrete variables. The key is the optional iss
argument: "the imaginary sample size used by the bayes method to estimate the conditional probability tables (CPTs) associated with discrete nodes".
So, for a binary root node X
in some network, the bayes
option in bnlearn::bn.fit
returns (Nx + iss / cptsize) / (N + iss)
as the probability of X = x
, where N
is your number of samples, Nx
the number of samples with X = x
, and cptsize
the size of the CPT of X
; in this case cptsize = 2
. The relevant code is in the bnlearn:::bn.fit.backend.discrete
function, in particular the line: tab = tab + extra.args$iss/prod(dim(tab))
Thus, iss / cptsize
is the number of imaginary observations for each entry in a CPT, as opposed to N
, the number of 'real' observations. With iss = 0
you would be getting a maximum likelihood estimate, as you would have no prior imaginary observations.
The higher iss
with respect to N
, the stronger the effect of the prior on your posterior parameter estimates. With a fixed iss
and a growing N
, the Bayesian estimator and the maximum likelihood estimator converge to the same value.
A common rule of thumb is to use a small non-zero iss
so that you avoid zero entries in the CPTs, corresponding to combinations that were not observed in the data. Such zero entries could then result in a network which generalizes poorly, such as some early versions of the Pathfinder system.
For more details on Bayesian parameter estimation you can have a look at the book by Koller and Friedman. I suppose many other Bayesian network books also cover the topic.