问题
Let's say I want to use the iris data example, but correctly classifying versicolor is 5 times more important to me.
library(party)
data(iris)
irisct <- ctree(Species ~ .,data = iris, weights=ifelse(iris$Species=='versicolor', 5, 1))
plot(irisct)
Then the tree graph changes the number of observations and conditional probabilities in each node (it multiplies versicolor by 5). Is there a way to "disable" this, i.e. show the original number of observations (total = 150 for iris)?
Many thanks for your help!
回答1:
The enhanced reimplementation of ctree()
in package partykit
also has somewhat more flexible plotting capabilities. Specifically, the node_barplot()
panel function gained a mainlab
argument that can be used for customizing the main labels. For example for the iris data:
library("partykit")
ct <- ctree(Species ~ ., data = iris)
You can set up a vector of labels and then supply a function that accesses these:
lab <- paste("Foo", 1:7)
ml <- function(id, nobs) lab[as.numeric(id)]
plot(ct, tp_args = list(mainlab = ml))
Of course, the example above is not very meaningful but could be modified to accomplish what you want with a little bit of coding.
However, be warned about the upsampling of certain observations using the weights
argument. The ctree()
function really treats the weights
as case weights and consequently the significance tests used for splitting do change. With increased number of observations, all p-values become smaller and hence the tree selects more splits (unless mincriterion
is increased simultaneously). Compare the ct
tree above with 4 terminal nodes with
ct2 <- ctree(Species ~ ., data = iris, weights = rep(2, 150))
ct3 <- ctree(Species ~ ., data = iris, weights = rep(2, 150), mincriterion = 0.999)
The resulting number of terminal nodes are
c(width(ct), width(ct2), width(ct3))
[1] 4 6 4
来源:https://stackoverflow.com/questions/27260838/ctree-classification-with-weights-results-displayed