问题
This might look like a trivial problem. But I am getting stuck in predicting results from a model. My problem is like this:
I have a dataset of shape 1000 x 19 (except target feature) but after one hot encoding it becomes 1000 x 141. Since I trained the model on the data which is of shape 1000 x 141, so I need data of shape 1 x 141 (at least) for prediction. I also know in python, I can make future prediction using
model.predict(data)
But, since I am getting data from an end user through a web portal which is shape of 1 x 19. Now I am very confused how should I proceed further to make predictions based on the user data.
How can I convert data of shape 1 x 19 into 1 x 141 as I have to maintain the same order with respect to train/test data means the order of column should not differ? Any help in this direction would be highly appreciated.
回答1:
I am assuming that to create a one hot encoding, you are using sklearn onehotencoder. If you using that, then the problem should be solved easily. Since you are fitting the one hot encoder on your training data
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(categories = "auto", handle_unknown = "ignore")
X_train_encoded = encoder.fit_transform(X_train)
So now in the above code, your encoder is fitted on your training data so when you get the test data, you can transform it into the same encoded data using this fitted encoder.
test_data = encoder.transform(test_data)
Now your test data will also be of 1x141 shape. You can check shape using
(pd.DataFrame(test_data.toarray())).shape
来源:https://stackoverflow.com/questions/56133664/predicitng-new-value-through-a-model-trained-on-one-hot-encoded-data