Simple model error on fit: Found input variables with inconsistent numbers of samples

一笑奈何 提交于 2020-05-17 08:52:08

问题


I know this question exists in various forms, but after searching the web for several days/hours, I still havent found anything, that solved my problem.

This is my notebook:

import numpy as np
import pandas as pd

X = pd.read_csv('../input/web-traffic-time-series-forecasting/train_1.csv.zip')
X = X.drop('Page', axis=1)
X.fillna(0, inplace=True, axis=0)

X_sliced = X.iloc[:, 0:367]
y_sliced = X.iloc[:, 367:-1]

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

linreg = LinearRegression()

X_sliced.drop(X_sliced.iloc[:, 182:367], inplace=True, axis=1) #Here, I make sure that my X_sliced has the same shape as y_sliced

X_sliced.shape

OUT: (145063, 182)

y_sliced.shape

OUT: (145063, 182)

X_train, y_train, X_test, y_test = train_test_split(X_sliced, y_sliced)
linreg.fit(X_train, y_train)

ValueError: Found input variables with inconsistent numbers of samples: [108797, 36266]

Why do I receive this error, when the shape of my dataframes are completely the same?

Link to original assignment on kaggle: https://www.kaggle.com/c/web-traffic-time-series-forecasting/overview


回答1:


You've assigned the outputs of train_test_split in the wrong order, it should be:

X_train, X_test, y_train, y_test = train_test_split(X_sliced, y_sliced) # x, x, y, y not x, y, x, y


来源:https://stackoverflow.com/questions/60927250/simple-model-error-on-fit-found-input-variables-with-inconsistent-numbers-of-sa

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!