ValueError: Found arrays with inconsistent numbers of samples

问题

Here is my code:

import pandas as pa
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

def get_accuracy(X_train, y_train, y_test):
    perceptron = Perceptron(random_state=241)
    perceptron.fit(X_train, y_train)
    result = accuracy_score(y_train, y_test)
    return result

test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
test_data.columns = ["class", "f1", "f2"]
train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
train_data.columns = ["class", "f1", "f2"]

accuracy = get_accuracy(train_data[train_data.columns[1:]], train_data[train_data.columns[0]], test_data[test_data.columns[0]])
print(accuracy)

I don't understand why I get this error:

Traceback (most recent call last):
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 35, in <module>
    accuracy = get_accuracy(train_data[train_data.columns[1:]], 
train_data[train_data.columns[0]], test_data[test_data.columns[0]])
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 22, in get_accuracy
    result = accuracy_score(y_train, y_test)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\metrics\classification.py", line 72, in _check_targets
    check_consistent_length(y_true, y_pred)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
    "%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [199 299]

I want to get accuracy by method accuracy_score by get this type of error. I Googled by I cannot find anything that can help me. Who can explain me what happens?

回答1:

sklearn.metrics.accuracy_score() takes y_true and y_pred arguments. That is, for the same data set (presumably the test set), it wants to know the ground truth and the values predicted by your model. This will allow it to evaluate how well your model has performed compared to a hypothetical perfect model.

In your code, you are passing the true outcome variables for two different data sets. These outcomes are both truth and in no way reflect your model's ability to correctly classify observations!

Updating your get_accuracy() function to also take X_test as a parameter, I think this is more in line with what you intended to do:

def get_accuracy(X_train, y_train, X_test, y_test):
    perceptron = Perceptron(random_state=241)
    perceptron.fit(X_train, y_train)
    pred_test = perceptron.predict(X_test)
    result = accuracy_score(y_test, pred_test)
    return result

来源：https://stackoverflow.com/questions/35247687/valueerror-found-arrays-with-inconsistent-numbers-of-samples

标签

python

pandas

machine-learning

scikit-learn

perceptron