replace missing values in categorical data

后端 未结 3 1275
孤独总比滥情好
孤独总比滥情好 2021-01-27 00:00

Let\'s suppose I have a column with categorical data \"red\" \"green\" \"blue\" and empty cells

red
green
red
blue
NaN

I\'m sure that the NaN b

3条回答
  •  梦毁少年i
    2021-01-27 00:35

    It depends on what you want to do with the data. Is the average of these colors useful for your purpose? You are creating a new possible value doing that, that is probably not wanted. Especially since you are talking about categorical data, and you are handling it as if it was numeric data.

    In Machine Learning you would replace the missing values with the most common categorical value regarding a target attribute (what you want to predict).

    Example: You want to predict if a person is male or female by looking at their car, and the color feature has some missing values. If most of the cars from male(female) drivers are blue(red), you would use that value to fill missing entries of cars from male(female) drivers.

提交回复
热议问题