Suppose I have a Pandas DataFrame like the below and I\'m encoding categorical_1 for training in scikit-learn:
data = {\'numeric_1\':[12.1, 3.2, 5.5, 6.8, 9.9],
To handle the mismatch between the set of categorical values in train and test sets I used;
length = train_categorical_data.shape[0]
empty_col = np.zeros((length,1))
test_categorical_data_processed = pd.DataFrame()
for col in train_categorical_data.columns:
test_categorical_data_processed[col] = test_categorical_data.get(col, empty_col)