Prediction with cpdist using “probabilities” as evidence

僤鯓⒐⒋嵵緔 提交于 2019-12-11 02:39:29

问题


I have a very quick question with an easy reproducible example that is related to my work on prediction with bnlearn

    library(bnlearn)
    Learning.set4=cbind(c("Yes","Yes","Yes","No","No","No"),c(9,10,8,3,2,1))
    Learning.set4=as.data.frame(Learning.set4)
    Learning.set4[,c(2)]=as.numeric(as.character(Learning.set4[,c(2)]))
    colnames(Learning.set4)=c("Cause","Cons")
    b.network=empty.graph(colnames(Learning.set4))
    struct.mat=matrix(0,2,2)
    colnames(struct.mat)=colnames(Learning.set4)
    rownames(struct.mat)=colnames(struct.mat)
    struct.mat[1,2]=1
    bnlearn::amat(b.network)=struct.mat
    haha=bn.fit(b.network,Learning.set4)


    #Some predictions with "lw" method

    #Here is the approach I know with a SET particular modality. 
    #(So it's happening with certainty, here for example I know Cause is "Yes")
    classic_prediction=cpdist(haha,nodes="Cons",evidence=list("Cause"="Yes"),method="lw")
    print(mean(classic_prediction[,c(1)]))


    #What if I wanted to predict the value of Cons, when Cause has a 60% chance of being Yes and 40% of being no?
    #I decided to do this, according the help
    #I could also make a function that generates "Yes" or "No" with proper probabilities.
    prediction_idea=cpdist(haha,nodes="Cons",evidence=list("Cause"=c("Yes","Yes","Yes","No","No")),method="lw")
    print(mean(prediction_idea[,c(1)]))

Here is what the help says:

"In the case of a discrete or ordinal node, two or more values can also be provided. In that case, the value for that node will be sampled with uniform probability from the set of specified values"

When I predict the value of a variable using categorical variables, I for now just used a certain modality of said variable as in the first prediction in the example. (Having the evidence set at "Yes" gets Cons to take a high value)

But if I wanted to predict Cons without knowing the exact modality of the variable Cause with certainty, could I use what I did in the second prediction (Just knowing the probabilities) ? Is this an elegant way or are there better implemented ones I don't know off?


回答1:


I got in touch with the creator of the package, and I will paste his answer related to the question here:

The call to cpquery() is wrong:

Prediction_idea=cpdist(haha,nodes="Cons",evidence=list("Cause"=c("Yes","Yes","Yes","No","No")),method="lw")
print(mean(prediction_idea[,c(1)]))

A query with the 40%-60% soft evidence requires you to place these new probabilities in the network first

haha$Cause = c(0.40, 0.60)

and then run the query without an evidence argument. (Because you do not have any hard evidence, really, just a different probability distribution for Cause.)


I will post the code that lets me do what I wanted off of the fitted network from the example.

change=haha$Cause$prob
change[1]=0.4
change[2]=0.6
haha$Cause=change
new_prediction=cpdist(haha,nodes="Cons",evidence=TRUE,method="lw")
print(mean(new_prediction[,c(1)]))


来源:https://stackoverflow.com/questions/41441812/prediction-with-cpdist-using-probabilities-as-evidence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!