Label encoding across multiple columns in scikit-learn

后端 未结 22 1913
礼貌的吻别
礼貌的吻别 2020-11-22 09:02

I\'m trying to use scikit-learn\'s LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to a

22条回答
  •  有刺的猬
    2020-11-22 09:22

    Instead of LabelEncoder we can use OrdinalEncoder from scikit learn, which allows multi-column encoding.

    Encode categorical features as an integer array. The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature.

    >>> from sklearn.preprocessing import OrdinalEncoder
    >>> enc = OrdinalEncoder()
    >>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
    >>> enc.fit(X)
    OrdinalEncoder()
    >>> enc.categories_
    [array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
    >>> enc.transform([['Female', 3], ['Male', 1]])
    array([[0., 2.],
           [1., 0.]])
    

    Both the description and example were copied from its documentation page which you can find here:

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html#sklearn.preprocessing.OrdinalEncoder

提交回复
热议问题