问题
I apologize in advance if I butcher this question as I'm very new to R and statistical analysis in general.
I've generated a conditional inference tree using the party
library.
When I plot(my_tree, type = "simple")
I get a result like this:
When I print(my_tree)
I get a result like this:
1) SOME_VALUE <= 2.5; criterion = 1, statistic = 1306.478
2) SOME_VALUE <= -10.5; criterion = 1, statistic = 173.416
3) SOME_VALUE <= -16; criterion = 1, statistic = 19.385
4)* weights = 275
3) SOME_VALUE > -16
5)* weights = 261
2) SOME_VALUE > -10.5
6) SOME_VALUE <= -2.5; criterion = 1, statistic = 24.094
7) SOME_VALUE <= -6.5; criterion = 0.974, statistic = 4.989
8)* weights = 346
7) SOME_VALUE > -6.5
9)* weights = 563
6) SOME_VALUE > -2.5
10)* weights = 442
1) SOME_VALUE > 2.5
11) SOME_VALUE <= 10; criterion = 1, statistic = 225.148
12) SOME_VALUE <= 6.5; criterion = 1, statistic = 18.789
13)* weights = 648
12) SOME_VALUE > 6.5
14)* weights = 473
11) SOME_VALUE > 10
15) SOME_VALUE <= 16; criterion = 1, statistic = 51.729
16)* weights = 595
15) SOME_VALUE > 16
17) SOME_VALUE <= 23.5; criterion = 0.997, statistic = 8.931
18)* weights = 488
17) SOME_VALUE > 23.5
19)* weights = 365
I prefer the output of print
, but it seems to be lacking the y = (0.96, 0.04)
values.
Ideally, I would like my output to look something like this:
1) SOME_VALUE <= 2.5; criterion = 1, statistic = 1306.478
2) SOME_VALUE <= -10.5; criterion = 1, statistic = 173.416
3) SOME_VALUE <= -16; criterion = 1, statistic = 19.385
4)* weights = 275; y = (0.96, 0.04)
3) SOME_VALUE > -16
5)* weights = 261; y = (0.831, 0.169)
2) SOME_VALUE > -10.5
...
How do I go about accomplishing this?
回答1:
It is possible to do this with the partykit
package (the successor to party
) but even there it requires some hacking. In principle, the print()
function is customizable with panel functions for inner and terminal nodes etc. But they do not look very nice even for seemingly simple tasks like this one.
As you appear to have used a tree with a bivariate response, let's consider this simple (albeit not very meaningful) reproducible example:
library("partykit")
airq <- subset(airquality, !is.na(Ozone))
ct <- ctree(Ozone + Wind ~ ., data = airq)
For the inner nodes let's assume we just want to show the p-value that is readily available in the $info
of each node. We can format this via:
ip <- function(node) formatinfo_node(node,
prefix = " ",
FUN = function(info) paste0("[p = ", format.pval(info$p.value), "]")
)
For the terminal nodes we want to show the number of observations (assuming no weights
have been used) and the mean response. Both are pre-computed in small tables and then accessed via the $id
of each node:
n <- table(ct$fitted[["(fitted)"]])
m <- aggregate(ct$fitted[["(response)"]], list(ct$fitted[["(fitted)"]]), mean)
m <- apply(m[, -1], 1, function(x) paste(round(x, digits = 3), collapse = ", "))
names(m) <- names(n)
The panel function is then defined by:
tp <- function(node) formatinfo_node(node,
prefix = ": ",
FUN = function(info) paste0(
"n = ", n[as.character(node$id)],
", y = (", m[as.character(node$id)], ")"
)
)
To apply this in the print()
method we need to call print.party()
directly because currently print.constparty()
does not pass this on correctly. (We will have to fix this in the partykit
package.)
print.party(ct, inner_panel = ip, terminal_panel = tp)
## [1] root
## | [2] Temp <= 82 [p = 0.0044842]
## | | [3] Temp <= 77: n = 52, y = (18.615, 11.562)
## | | [4] Temp > 77: n = 27, y = (41.815, 9.737)
## | [5] Temp > 82: n = 37, y = (75.405, 7.565)
This is hopefully close to what you wanted to do and should give you a template for further modifications.
来源:https://stackoverflow.com/questions/33356122/displaying-inference-tree-node-values-with-print