How to fix “ValueError: Expected 2D array, got 1D array instead” in sklearn/python?

后端 未结 3 821
鱼传尺愫
鱼传尺愫 2021-01-13 21:51

I there. I just started with the machine learning with a simple example to try and learn. So, I want to classify the files in my disk based on the file type by making use of

3条回答
  •  臣服心动
    2021-01-13 22:57

    When passing your input to the classifiers, pass 2D arrays (of shape (M, N) where N >= 1), not 1D arrays (which have shape (N,)). The error message is pretty clear,

    Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

    from sklearn.model_selection import train_test_split
    
    # X.shape should be (N, M) where M >= 1
    X = mydata[['script']]  
    # y.shape should be (N, 1)
    y = mydata['label'] 
    # perform label encoding if "label" contains strings
    # y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
    X_train, X_test, y_train, y_test = train_test_split(
                          X, y, test_size=0.33, random_state=42)
    ...
    
    clf.fit(X_train, y_train) 
    print(clf.score(X_test, y_test))
    

    Some other helpful tips -

    1. split your data into valid train and test portions. Do not use your training data to test - that leads to inaccurate estimations of your classifier's strength
    2. I'd recommend factorizing your labels, so you're dealing with integers. It's just easier.

提交回复
热议问题