How to do discretization of continuous attributes in sklearn?

后端未结

关注

 5  477

萌比男神i 2021-01-02 08:41

My data consists of a mix of continuous and categorical features. Below is a small snippet of how my data looks like in the csv format (Consider it as data collected by a su

5条回答

有刺的猬 (楼主)

2021-01-02 09:06
You may also consider rendering the Categorical variables numerical, e.g. via indicator variables, a procedure also known as one hot encoding

Try
```
from sklearn.preprocessing import OneHotEncoder
```
and fit it to your categorical data, followed by a numerical estimation method such as linear regression. As long as there aren't too many categories (city may be a little too much), this can work well.

As for discretization of continuous variables, you may consider binning using an adapted bin size, or, equivalently, uniform binning after histogram normalization. numpy.histogram may be helpful here. Also, while Fayyad-Irani clustering isn't implemented in sklearn, feel free to check out sklearn.cluster for adaptive discretizations of your data (even if it is only 1D), e.g. via KMeans .
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...