问题
I have learned and fitted Bayesian Network in bnlearn R package and I wish to predict it's "event" node value.
fl="data/discrete_kdd_10.txt"
h=TRUE
dtbl1 = read.csv(file=fl, head=h, sep=",")
net=hc(dtbl1)
fitted=bn.fit(net,dtbl1)
I want to predict the value of "event" node based on the evidence stored in another file with the same structure as the file used for learning.
fileName="data/dcmp.txt"
dtbl2 = read.csv(file=fileName, head=h, sep=",")
predict(fitted,"event",dtbl2)
However, predict fails with
Error in check.data(data) : variable duration must have at least two levels.
I don't understand why there should be any restriction on number of levels of variables in the evidence data.frame.
The dtbl2
data.frame contains only few rows, one for each scenario in which I want to predict the "event" value.
I know I can use cpquery, but I wish to use the predict
function also for networks with mixed variables (both discrete and continuous). I haven't found out how to make use of evidence of continuous variable in cpqery.
Can someone please explain what I'm doing wrong with the predict
function and how should I do it right?
Thanks in advance!
回答1:
The problem was that reading the evidence data.frame in
fileName="data/dcmp.txt"
dtbl2 = read.csv(file=fileName, head=h, sep=",")
predict(fitted,"event",dtbl2)
caused categoric variables to be factors with different number of levels (subset of levels of the original training set).
I used following code to solve this issue.
for(i in 1:dim(dtbl2)[2]){
dtbl2[[i]] = factor(dtbl2[[i]],levels = levels(dtbl1[[i]]))
}
By the way bnlearn package does fit models with mixed variables and also provides functions for predictions in them.
来源:https://stackoverflow.com/questions/30598739/error-in-bn-fit-predict-function-in-bnlear-r