k nearest neighbors with cross validation for accuracy score and confusion matrix

前端 未结 1 775
生来不讨喜
生来不讨喜 2021-01-24 08:12

I have the following data where for each column, the rows with numbers are the input and the letter is the output.

A,A,A,B,B,B
-0.979090189,0.338819904,-0.253746         


        
1条回答
  •  星月不相逢
    2021-01-24 08:43

    I think your model does not get trained properly and because it only has to guess one value it doesn't get it right. May I suggest switching to KFold or StratifiedKFold. LOO has the disadvantage that for large samples it becomes extemely time consuming. Here is what happened when I implemented StratifiedKFold with 3 splits on your X data. I have randomly filled y with 0 and 1, instead of using A and B and have not trasposed the data so it has 12 rows:

    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.metrics import accuracy_score
    from sklearn.metrics import confusion_matrix
    from sklearn.model_selection import StratifiedKFold
    import pandas as pd
    
    csv = 'C:\df_low_X.csv'
    df = pd.read_csv(csv, header=None)
    print(df)
    
    X = df.iloc[:, :-1].values
    y = df.iloc[:, -1].values
    
    clf = KNeighborsClassifier()
    kf = StratifiedKFold(n_splits = 3)
    
    ac = []
    cm = []
    
    for train_index, test_index in kf.split(X,y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        print(X_train, X_test)
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        ac.append(accuracy_score(y_test, y_pred))
        cm.append(confusion_matrix(y_test, y_pred))
    print(ac)
    print(cm)
    
    # ac
    [0.25, 0.75, 0.5]
    
    # cm
    [array([[1, 1],
           [2, 0]], dtype=int64), 
    
    array([[1, 1],
           [0, 2]], dtype=int64),
    
     array([[0, 2],
           [0, 2]], dtype=int64)]
    

    0 讨论(0)
提交回复
热议问题