I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of
When passing your input to the classifiers, pass 2D arrays (of shape (M, N)
where N >= 1), not 1D arrays (which have shape (N,)
). The error message is pretty clear,
Reshape your data either using
array.reshape(-1, 1)
if your data has a single feature orarray.reshape(1, -1)
if it contains a single sample.
from sklearn.model_selection import train_test_split
# X.shape should be (N, M) where M >= 1
X = mydata[['script']]
# y.shape should be (N, 1)
y = mydata['label']
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
...
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
Some other helpful tips -