Let\'s suppose I have a column with categorical data \"red\" \"green\" \"blue\" and empty cells
red
green
red
blue
NaN
I\'m sure that the NaN b
In addition to Lan's answer's approach, which seems most commonly used, you can use something based on matrix factorization. For example there is a variant of Generalized Low Rank Models that can impute such data, just as probabilistic matrix factorization is used to impute continuous data.
GLRMs can be used from H2O which provides bindings for both Python and R.