Tackling Class Imbalance: scaling contribution to loss and sgd

前端未结

关注

 2  618

灰色年华

(An update to this question has been added.)

I am a graduate student at the university of Ghent, Belgium; my research is about emotion recognition w

相关标签:

2条回答

闹比i

2020-11-27 05:39
Why don't you use the InfogainLoss layer to compensate for the imbalance in your training set?

The Infogain loss is defined using a weight matrix H (in your case 2-by-2) The meaning of its entries are
```
[cost of predicting 1 when gt is 0,    cost of predicting 0 when gt is 0
 cost of predicting 1 when gt is 1,    cost of predicting 0 when gt is 1]
```
So, you can set the entries of H to reflect the difference between errors in predicting 0 or 1.

You can find how to define matrix H for caffe in this thread.

Regarding sample weights, you may find this post interesting: it shows how to modify the SoftmaxWithLoss layer to take into account sample weights.

Recently, a modification to cross-entropy loss was proposed by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Focal Loss for Dense Object Detection, (ICCV 2017).
The idea behind focal-loss is to assign different weight for each example based on the relative difficulty of predicting this example (rather based on class size etc.). From the brief time I got to experiment with this loss, it feels superior to "InfogainLoss" with class-size weights.
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-11-27 05:40
I have also come across this class imbalance problem in my classification task. Right now I am using CrossEntropyLoss with weight (documentation here) and it works fine. The idea is to give more loss to samples in classes with smaller number of images.

Calculating the weight

weight for each class in inversely proportional to the image number in this class. Here is a snippet to calculate weight for all class using numpy,
```
cls_num = []
# train_labels is a list of class labels for all training samples
# the labels are in range [0, n-1] (n classes in total)
train_labels = np.asarray(train_labels)
num_cls = np.unique(train_labels).size

for i in range(num_cls):
    cls_num.append(len(np.where(train_labels==i)[0]))

cls_num = np.array(cls_num)

cls_num = cls_num.max()/cls_num
x = 1.0/np.sum(cls_num)

# the weight is an array which contains weight to use in CrossEntropyLoss
# for each class.
weight = x*cls_num
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

Tackling Class Imbalance: scaling contribution to loss and sgd

Calculating the weight