How to do discretization of continuous attributes in sklearn?

后端 未结 5 465
萌比男神i
萌比男神i 2021-01-02 08:41

My data consists of a mix of continuous and categorical features. Below is a small snippet of how my data looks like in the csv format (Consider it as data collected by a su

5条回答
  •  有刺的猬
    2021-01-02 09:06

    You may also consider rendering the Categorical variables numerical, e.g. via indicator variables, a procedure also known as one hot encoding

    Try

    from sklearn.preprocessing import OneHotEncoder
    

    and fit it to your categorical data, followed by a numerical estimation method such as linear regression. As long as there aren't too many categories (city may be a little too much), this can work well.

    As for discretization of continuous variables, you may consider binning using an adapted bin size, or, equivalently, uniform binning after histogram normalization. numpy.histogram may be helpful here. Also, while Fayyad-Irani clustering isn't implemented in sklearn, feel free to check out sklearn.cluster for adaptive discretizations of your data (even if it is only 1D), e.g. via KMeans .

提交回复
热议问题