How to preProcess features when some of them are factors?

后端 未结 2 851
有刺的猬
有刺的猬 2021-01-04 05:05

My question is related to this one regarding categorical data (factors in R terms) when using the Caret package. I understand from the linked post that if you use the \"fo

2条回答
  •  花落未央
    2021-01-04 05:57

    Here's a quick way to exclude factors or whatever you'd like from consideration:

    set.seed(1)
    N <- 20
    dat <- data.frame( 
        x = factor(sample(LETTERS[1:5],N,replace=TRUE)),
        y = rnorm(N,5,12),
        z = rnorm(N,-5,17) + runif(N,2,12)
    )
    
    #' Function which wraps preProcess to exclude factors from the model.matrix
    ppWrapper <- function( x, excludeClasses=c("factor"), ... ) {
        whichToExclude <- sapply( x, function(y) any(sapply(excludeClasses, function(excludeClass) is(y,excludeClass) )) )
        processedMat <- predict( preProcess( x[!whichToExclude], ...), newdata=x[!whichToExclude] )
        x[!whichToExclude] <- processedMat
        x
    }
    
    > ppWrapper(dat)
       x          y           z
    1  C  1.6173595 -0.44054795
    2  A -0.2933705 -1.98856921
    3  C  1.2177384  0.65420288
    4  D -0.8710374  0.62409408
    5  D -0.4504202 -0.34048640
    6  D -0.6943283  0.24236671
    7  E  0.7778192  0.91606677
    8  D  0.2184563 -0.44935163
    9  C -0.3611408  0.26075970
    10 B -0.7066441 -0.23046073
    11 D -1.5154339 -0.75549761
    12 D  0.4504825  0.38552988
    13 B  1.5692675  0.04093040
    14 C  0.4127541  0.13161807
    15 D  0.5426321  1.09527418
    16 B -2.1040322 -0.04544407
    17 C  0.6928574  1.12090541
    18 B  0.3580960  1.91446230
    19 E  0.3619967 -0.89018040
    20 A -1.2230522 -2.24567237
    

    You can pass anything you want into ppWrapper and it will get passed along to preProcess.

提交回复
热议问题