Boruta box plots in R

邮差的信 提交于 2021-02-07 23:57:47

问题


I'm doing variable selection with the Boruta package in R. Boruta gives me the standard series of boxplots in a single graph, which is useful, but given the fact that I have too many predictors, I am hoping to be able to limit the number of boxplots that appear in the boruta plot. Something like the following image.

Basicacly, I want to "zoom" on the right end of the plot, but have no idea how to do that with the boruta plot object.

Thanks,

MR


回答1:


Sounds like an simple question, the solution seems surprisingly convoluted. Perhaps somebody can come up with a quicker/more elegant way...

Here, I create a new function based on the source function plot.Boruta, and add a function argument pars that takes the names of variables/predictors that we'd like to include in the plot.

As an example, I use the iris dataset to fit a model.

# Fit model to the iris dataset
library(Boruta);
fit <- Boruta(Species ~ ., data = iris, doTrace = 2);

The function generateCol is internally called by plot.Boruta, but is not exported and therefore not available outside of the package. However, we need the function for our revised plot.Boruta routine.

# generateCol is needed by plot.Boruta
generateCol<-function(x,colCode,col,numShadow){
 #Checking arguments
 if(is.null(col) & length(colCode)!=4)
  stop('colCode should have 4 elements.');
 #Generating col
 if(is.null(col)){
  rep(colCode[4],length(x$finalDecision)+numShadow)->cc;
  cc[c(x$finalDecision=='Confirmed',rep(FALSE,numShadow))]<-colCode[1];
  cc[c(x$finalDecision=='Tentative',rep(FALSE,numShadow))]<-colCode[2];
  cc[c(x$finalDecision=='Rejected',rep(FALSE,numShadow))]<-colCode[3];
  col=cc;
 }
 return(col);
}

We now modify plot.Boruta, and add a function parameter pars, by which we filter our list of variables.

# Modified plot.Boruta
plot.Boruta.sel <- function(
    x,
    pars = NULL,
    colCode = c('green','yellow','red','blue'),
    sort = TRUE,
    whichShadow = c(TRUE, TRUE, TRUE),
    col = NULL, xlab = 'Attributes', ylab = 'Importance', ...) {

    #Checking arguments
    if(class(x)!='Boruta')
        stop('This function needs Boruta object as an argument.');
    if(is.null(x$ImpHistory))
        stop('Importance history was not stored during the Boruta run.');

    #Removal of -Infs and conversion to a list
    lz <- lapply(1:ncol(x$ImpHistory), function(i)
        x$ImpHistory[is.finite(x$ImpHistory[,i]),i]);
    colnames(x$ImpHistory)->names(lz);

    #Selection of shadow meta-attributes
    numShadow <- sum(whichShadow);
    lz <- lz[c(rep(TRUE,length(x$finalDecision)), whichShadow)];

    #Generating color vector
    col <- generateCol(x, colCode, col, numShadow);

    #Ordering boxes due to attribute median importance
    if (sort) {
        ii <- order(sapply(lz, stats::median));
        lz <- lz[ii];
        col <- col[ii];
    }

    # Select parameters of interest
    if (!is.null(pars)) lz <- lz[names(lz) %in% pars];

    #Final plotting
    graphics::boxplot(lz, xlab = xlab, ylab = ylab, col = col, ...);
    invisible(x);
}

Now all we need to do is call plot.Boruta.sel instead of plot, and specify the variables that we'd like to include.

plot.Boruta.sel(fit, pars = c("Sepal.Length", "Sepal.Width"));



来源:https://stackoverflow.com/questions/47342553/boruta-box-plots-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!