Python ValueError : ColumnTransformer, Column Ordering is Not Equal

这一生的挚爱 提交于 2020-05-29 09:44:39

问题


I put together the following function that read csv, train the model and predict the request data.

I've got the following ValueError : Column ordering must be equal for fit and for transform when using the remainder keyword

The training data and the data used for prediction has exact the same number of column , e.g., 15. I am not sure how the "ordering" of the column could have changed.

~/.local/lib/python3.5/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
    417         Xt = X
    418         for _, name, transform in self._iter(with_final=False):
--> 419             Xt = transform.transform(Xt)
    420         return self.steps[-1][-1].predict(Xt, **predict_params)
    421 

~/.local/lib/python3.5/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
    581             if (n_cols_transform >= n_cols_fit and
    582                     any(X.columns[:n_cols_fit] != self._df_columns)):
--> 583                 raise ValueError('Column ordering must be equal for fit '
    584                                  'and for transform when using the '
    585                                  'remainder keyword')

ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword

Function:

numeric_transformer = Pipeline(steps=[

    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

#Putting data transformation and the model in a pipeline
rf = Pipeline(steps=[('preprocessor', preprocessor),
                     ('classifier', RandomForestClassifier(
                                        n_estimators=500,
                                        criterion="gini",
                                        max_features="sqrt",
                                        min_samples_leaf=4))])

rf.fit(X_train, y_train)

request_data = {'A': [request.A],
                'B': [request.B],
                'C': [request.C],
                'D': [request.D],
                'E': [request.E],
                'F': [request.F],
                'G': [request.G],
                'H': [request.H],
                'I': [request.I],
                'J': [request.J],
                'K': [request.K],
                'L': [request.L],
                'M': [request.M],
                'N': [request.N],
                'O': [request.O]}

df_resp = pd.DataFrame(data=request_data)
response = rf.predict(df_resp)

output = {"Safety Rating": response[0]}

return output

回答1:


What I understand from the error message is that X_train.columns and df_resp.columns are not the same but .predict() needs them to be.

In order to force this equality you could pass the column list of X_train as an argument when creating the dataframe:

pd.DataFrame(data=request_data, columns=X_train.columns)




回答2:


You can use following generic function in order to sort columns correctly :

def rearrange_columns(df, first_order="categorical"):
    """
    ColumnTransformer of scikit-learn Pipeline changes the order of the dataframe columns.
    Use this function to reorder the features columns to be consistent with the ouptut of the pipeline
    """
    cat_ix = [ii for ii, col in enumerate(df.columns.values) if df[col].dtypes=="object"]
    num_ix = [ii for ii, col in enumerate(df.columns.values) if ii not in cat_ix]    
    new_order =  cat_ix + num_ix if first_order == "categorical" else num_ix + cat_ix    
    return [df.columns.values[ii] for ii in new_order]


来源:https://stackoverflow.com/questions/61001934/python-valueerror-columntransformer-column-ordering-is-not-equal

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!