random-forest

Handling different Factor Levels in Train and Test data

寵の児 提交于 2020-08-24 14:56:58
问题 I have a training data set of 20 column , all of which are factors which i have to use for training a model, I have been given test data set on which I have to apply my model for predictions and submit. I was doing initial data exploration and just out of curiosity checked the levels of training data and testing data levels since we are dealing with all categorical variables.To my dismay most of the categories (variables) have different levels in training and testing data set. for example

Handling different Factor Levels in Train and Test data

折月煮酒 提交于 2020-08-24 14:43:48
问题 I have a training data set of 20 column , all of which are factors which i have to use for training a model, I have been given test data set on which I have to apply my model for predictions and submit. I was doing initial data exploration and just out of curiosity checked the levels of training data and testing data levels since we are dealing with all categorical variables.To my dismay most of the categories (variables) have different levels in training and testing data set. for example

Random Forest with GridSearchCV - Error on param_grid

风流意气都作罢 提交于 2020-08-21 01:11:06
问题 Im trying to create a Random Forest model with GridSearchCV but am getting an error pertaining to param_grid: "ValueError: Invalid parameter max_features for estimator Pipeline. Check the list of available parameters with `estimator.get_params().keys()" . I'm classifying documents so I am also pushing tf-idf vectorizer to the pipeline. Here is the code: from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score,