问题
I have to predict the type of program a student is in based on other attributes.
prog
is a categorical variable indicating what type of program a student is in: “General” (1), “Academic” (2), or “Vocational” (3)
Ses
is a categorical variable indicating someone’s socioeconomic class: “Low” (1), “Middle” (2), and “High” (3)
read
, write
, math
, science
is their scores on different tests
honors
Whether they have enrolled or not
csv file in image format;
import pandas as pd;
import numpy as np;
df1=pd.get_dummies(df,drop_first=True);
X=df1.drop(columns=['prog_general','prog_vocation'],axis=1);
y=df1.loc[:,['prog_general','prog_vocation']];
from sklearn.model_selection import train_test_split;
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.30, random_state=42);
from sklearn.linear_model import LogisticRegression;
from sklearn.metrics import classification_report;
clf=LogisticRegression(multi_class='multinomial',solver='newton-cg');
model=clf.fit(X_train,y_train)
But here I am getting the following error:
ValueError: bad input shape (140, 2).
回答1:
As such, LogisticRegression
does not handle multiple targets. But this is not the case with all the model in Sklearn. For example, all tree based models (DecisionTreeClassifier) can handle multi-output naturally.
To make this work for LogisticRegression
, you need the MultiOutputClassifier wrapper.
Example:
import numpy as np
from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression
X, y = make_multilabel_classification(n_classes=3, random_state=0)
clf = MultiOutputClassifier(estimator= LogisticRegression()).fit(X, y)
clf.predict(X[-2:])
来源:https://stackoverflow.com/questions/61977692/how-to-use-multinomial-logistic-regression-for-multilabel-classification-problem