发表新帖

发表新帖

Can sklearn random forest directly handle categorical features?

后端未结

关注

 3  542

Say I have a categorical feature, color, which takes the values

[\'red\', \'blue\', \'green\', \'orange\'],

and I want to use it to predict something in a ra

相关标签:

3条回答

佛祖请我去吃肉

2020-12-04 12:04

Most implementations of random forest (and many other machine learning algorithms) that accept categorical inputs are either just automating the encoding of categorical features for you or using a method that becomes computationally intractable for large numbers of categories.

A notable exception is H2O. H2O has a very efficient method for handling categorical data directly which often gives it an edge over tree based methods that require one-hot-encoding.

This article by Will McGinnis has a very good discussion of one-hot-encoding and alternatives.

This article by Nick Dingwall and Chris Potts has a very good discussion about categorical variables and tree based learners.

0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2020-12-04 12:04

You have to make the categorical variable into a series of dummy variables. Yes I know its annoying and seems unnecessary but that is how sklearn works. if you are using pandas. use pd.get_dummies, it works really well.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-12-04 12:09

No, there isn't. Somebody's working on this and the patch might be merged into mainline some day, but right now there's no support for categorical variables in scikit-learn except dummy (one-hot) encoding.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题