Label Encoder functionality in R?

前端未结

关注

 9  953

In python, scikit has a great function called LabelEncoder that maps categorical levels (strings) to integer representation.

Is there anything in R to do this?

相关标签:

9条回答

眼角桃花

2021-02-06 09:27
Here is an easy and neat solution:

From the superml package: https://www.rdocumentation.org/packages/superml/versions/0.5.3 There is a LabelEncoder class: https://www.rdocumentation.org/packages/superml/versions/0.5.3/topics/LabelEncoder
```
install.packages("superml")
library(superml)

lbl <- LabelEncoder$new()
lbl$fit(sample_dat$column)
sample_dat$column <- lbl$fit_transform(sample_dat$column)
decode_names <- lbl$inverse_transform(sample_dat$column)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2021-02-06 09:29
Create your vector of data:
```
colors <- c("red", "red", "blue", "green")
```
Create a factor:
```
factors <- factor(colors)
```
Convert the factor to numbers:
```
as.numeric(factors)
```
Output: (note that this is in alphabetical order)
```
# [1] 3 3 1 2
```
You can also set a custom numbering system: (note that the output now follows the "rainbow color order" that I defined)
```
rainbow <- c("red","orange","yellow","green","blue","purple")
ordered <- factor(colors, levels = rainbow)
as.numeric(ordered)
# [1] 1 1 5 4
```
See ?factor.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2021-02-06 09:30

It's hard to believe why no one has mentioned caret's dummyVars function.

This is a widely searched question, and people don't want to write their own methods or copy and paste other users methods, they want a package, and caret is the closest thing to sklearn in R.

EDIT: I now realize that what the user actually want's is to turn strings into a counting number, which is just as.numeric(as.factor(x)) but I'm going to leave this here because using hot-one encoding is the more accurate method of encoding categorical data.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2