How to handle missing NaNs for machine learning in python
How to handle missing values in datasets before applying machine learning algorithm??. I noticed that it is not a smart thing to drop missing NAN values. I usually do interpolate (compute mean) using pandas and fill it up the data which is kind of works and improves the classification accuracy but may not be the best thing to do. Here is a very important question. What is the best way to handle missing values in data set? For example if you see this dataset, only 30% has original data. Int64Index: 7049 entries, 0 to 7048 Data columns (total 31 columns): left_eye_center_x 7039 non-null float64