Label Encoder functionality in R?

前端 未结 9 917
你的背包
你的背包 2021-02-06 08:56

In python, scikit has a great function called LabelEncoder that maps categorical levels (strings) to integer representation.

Is there anything in R to do this?

相关标签:
9条回答
  • 2021-02-06 09:27

    Here is an easy and neat solution:

    From the superml package: https://www.rdocumentation.org/packages/superml/versions/0.5.3 There is a LabelEncoder class: https://www.rdocumentation.org/packages/superml/versions/0.5.3/topics/LabelEncoder

    install.packages("superml")
    library(superml)
    
    lbl <- LabelEncoder$new()
    lbl$fit(sample_dat$column)
    sample_dat$column <- lbl$fit_transform(sample_dat$column)
    decode_names <- lbl$inverse_transform(sample_dat$column)
    
    0 讨论(0)
  • 2021-02-06 09:29

    Create your vector of data:

    colors <- c("red", "red", "blue", "green")
    

    Create a factor:

    factors <- factor(colors)
    

    Convert the factor to numbers:

    as.numeric(factors)
    

    Output: (note that this is in alphabetical order)

    # [1] 3 3 1 2
    

    You can also set a custom numbering system: (note that the output now follows the "rainbow color order" that I defined)

    rainbow <- c("red","orange","yellow","green","blue","purple")
    ordered <- factor(colors, levels = rainbow)
    as.numeric(ordered)
    # [1] 1 1 5 4
    

    See ?factor.

    0 讨论(0)
  • 2021-02-06 09:30

    It's hard to believe why no one has mentioned caret's dummyVars function.

    This is a widely searched question, and people don't want to write their own methods or copy and paste other users methods, they want a package, and caret is the closest thing to sklearn in R.

    EDIT: I now realize that what the user actually want's is to turn strings into a counting number, which is just as.numeric(as.factor(x)) but I'm going to leave this here because using hot-one encoding is the more accurate method of encoding categorical data.

    0 讨论(0)
提交回复
热议问题