replace missing values in categorical data
问题 Let's suppose I have a column with categorical data "red" "green" "blue" and empty cells red green red blue NaN I'm sure that the NaN belongs to red green blue, should I replace the NaN by the average of the colors or is a too strong assumption? It will be col1 | col2 | col3 1 0 0 0 1 0 1 0 0 0 0 1 0.5 0.25 0.25 Or even scale the last row but keeping the ratio so these values have less influence? Usually what is the best practice? 0.25 0.125 0.125 回答1: It depends on what you want to do with