I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of
A Simple solution that reshapes it automatically is instead of using:
X=dataset.iloc[:, 0].values
You can use:
X=dataset.iloc[:, :-1].values
that is if you only have two column and you are trying to get the first one the code gets all the column except the last one
X=dataset.iloc[:, 0].values
y=dataset.iloc[:, 1].values
regressor=LinearRegression()
X=X.reshape(-1,1)
regressor.fit(X,y)
I had the following code. The reshape operator is not an inplace operator. So we have to replace it's value by the value after reshaping like given above.
When passing your input to the classifiers, pass 2D arrays (of shape (M, N)
where N >= 1), not 1D arrays (which have shape (N,)
). The error message is pretty clear,
Reshape your data either using
array.reshape(-1, 1)
if your data has a single feature orarray.reshape(1, -1)
if it contains a single sample.
from sklearn.model_selection import train_test_split
# X.shape should be (N, M) where M >= 1
X = mydata[['script']]
# y.shape should be (N, 1)
y = mydata['label']
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
...
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
Some other helpful tips -