Multiclass Classification with LightGBM

问题

I am trying to model a classifier for a multi-class Classification problem (3 Classes) using LightGBM in Python. I used the following parameters.

params = {'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'multiclass',
    'num_class':3,
    'metric': 'multi_logloss',
    'learning_rate': 0.002296,
    'max_depth': 7,
    'num_leaves': 17,
    'feature_fraction': 0.4,
    'bagging_fraction': 0.6,
    'bagging_freq': 17}

All the categorical features of the dataset is label encoded with LabelEncoder. I trained the model after running cv with eartly_stopping as shown below.

lgb_cv = lgbm.cv(params, d_train, num_boost_round=10000, nfold=3, shuffle=True, stratified=True, verbose_eval=20, early_stopping_rounds=100)

nround = lgb_cv['multi_logloss-mean'].index(np.min(lgb_cv['multi_logloss-mean']))
print(nround)

model = lgbm.train(params, d_train, num_boost_round=nround)

After training, I made prediction with model like this,

preds = model.predict(test)
print(preds)

I got a nested array as output like this.

[[  7.93856847e-06   9.99989550e-01   2.51164967e-06]
 [  7.26332978e-01   1.65316511e-05   2.73650491e-01]
 [  7.28564308e-01   8.36756769e-06   2.71427325e-01]
 ..., 
 [  7.26892634e-01   1.26915179e-05   2.73094674e-01]
 [  5.93217601e-01   2.07172044e-04   4.06575227e-01]
 [  5.91722491e-05   9.99883828e-01   5.69994435e-05]]

As each list in the preds represent the class probabilites I used np.argmax() to find the classes like this..

predictions = []

for x in preds:
    predictions.append(np.argmax(x))

While analyzing the prediction I found that my predictions contain only 2 classes - 0 and 1. Class 2 was the 2nd largest class in the training set, but it was nowhere to be found in the predictions.. On evaluating the result it gave about 78% accuracy.

So, why didn't my model predict class 2 for any of the cases.? Is there anything wrong in the parameters I used.?

Isn't this the proper way to make interpret prediction made by the model.? Should I make any changes for the parameters.??

回答1:

Try troubleshooting by swapping classes 0 and 2, and re-running the trainining and prediction process.

If the new predictions only contain classes 1 and 2 (most likely given your provided data):

Classifier may not have learnt the third class; perhaps its features overlap with those of a larger class, and the classifier defaults to the larger class in order to minimise the objective function. Try providing a balanced training set (same number of samples per class) and retry.

If the new predictions do contain all 3 classes:

Something went wrong in your code somewhere. More information is needed to determine what exactly went wrong.

Hope this helps.

回答2:

From the output you are providing there seems to be nothing wrong in the predictions.

The model produces three probabilities as you show and just from the first output you provided [ 7.93856847e-06 9.99989550e-01 2.51164967e-06] class 2 has a higher probability, so I can't see the problem here.

Class 0 is the first class, class 1 is actually class 2 the second class, 2 is the third class. So I guess nothing is wrong.

回答3:

The solution is:

best_preds_svm = [np.argmax(line) for line in preds]

Then you can print the class which has the most reasonable result.

回答4:

import pandas as pd

pd.DataFrame(preds).apply(lambda x: np.argmax(x), axis=1)

来源：https://stackoverflow.com/questions/47370240/multiclass-classification-with-lightgbm

标签

python

machine-learning

predict

multiclass-classification

lightgbm