StatsModel logit.predict error: Number of rows mismatch between data argument and new values

余生长醉 提交于 2020-01-24 01:34:27

问题


I have a train dataframe(227845 lines) and a test dataframe (56962 lines). What I want to do is to run a statsmodel logit regression on the train data and then predict the values for the test data. But after I train my model when I try to predict the test values I get the error :

PatsyError: Number of rows mismatch between data argument and train.loc[:, train.columns != 'Class'] (56962 versus 227845)
train['Class'] ~ train.loc[:, train.columns != 'Class']

My steps for the analysis are:

import statsmodels.formula.api as smf
from statsmodels.formula.api import logit
from sklearn.model_selection import train_test_split

dataS =  pd.read_csv('sample.csv')
train, test = train_test_split(dataS , test_size=0.3, random_state=0)

Data Columns:

['Time' 'V1' 'V2' 'V3' 'V4' 'V5' 'V6' 'V7' 'V8' 'V9' 'V10' 'V11' 'V12'
 'V13' 'V14' 'V15' 'V16' 'V17' 'V18' 'V19' 'V20' 'V21' 'V22' 'V23' 'V24'
 'V25' 'V26' 'V27' 'V28' 'Amount' 'Class']  

mod = logit("dataS['Class']  ~ dataS.loc[:, dataS.columns != 'Class']", data = dataS).fit()

predictions = mod.predict(test.loc[:, test.columns != 'Class'])

I tried to run it with sklearn.linear_model.LogisticRegression and with statsmodel.Logit and in both cases predict() was working fine but the statistics for the analysis where not the ones that I expected comparing it with the smf model.

Can someone help?

来源:https://stackoverflow.com/questions/48130196/statsmodel-logit-predict-error-number-of-rows-mismatch-between-data-argument-an

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!