Shape mismatch: if categories is an array, it has to be of shape (n_features,)

十年热恋 提交于 2021-01-29 17:01:04

问题


Here is the code I'm trying to execute to encode the values of the first column of my data set using dummy values.

import numpy as py
import matplotlib.pyplot as plt
import pandas as pd
 

DataSet = pd.read_csv('Data.csv')
x=DataSet.iloc[:, :-1].values
y=DataSet.iloc[:,3].values

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=py.nan,strategy='mean')
imputer=imputer.fit(x[:, 1:3])
x[:, 1:3]=imputer.transform(x[:, 1:3])


from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[0])
x=onehotencoder.fit_transform(x).toarray()

Here's the data I'm working on

France  44.0    72000.0
Spain   27.0    48000.0
Germany 30.0    54000.0
Spain   38.0    61000.0
Germany 40.0    63777.7
France  35.0    58000.0
Spain   38.777  52000.0
France  48.0    79000.0
Germany 50.0    83000.0
France  37.0    67000.0

I'm getting a error stating

Shape mismatch: if categories is an array, it has to be of shape (n_features,). 

Can anyone help me fix this?


回答1:


Your second doesn't seem to be a categorical features, you should only one_hot_encode features which can take a finite number of discrete value. Like the first column which can only take a limited number of value ('spain', 'germany', 'france') If you only encode de the first column you can do:

from sklearn.preprocessing import OneHotEncoder
onehotencoder=OneHotEncoder(categories=[['France','Germany','Spain']])
x_1=onehotencoder.fit_transform(x[:,0].reshape(-1, 1)).toarray()
x = np.concatenate([x_1,x[:,1:]], axis=1)

and then your data will be in the form:

France Germany Spain score
1      0       0     44.0
0      0       1     27.0
...

Also, You only have 3 columns on your data but you're calling the fourth column with y=DataSet.iloc[:,3].values (first column start at index 0 -> .iloc[:,3] should give 4th column, then.



来源:https://stackoverflow.com/questions/62633492/shape-mismatch-if-categories-is-an-array-it-has-to-be-of-shape-n-features

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!