问题
I have been trying to implement the softmax version of the triplet loss in Caffe described in
Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.
I have tried this but I am finding it hard to calculate the gradient as the L2 in exponent is not squared.
Can someone please help me here?
回答1:
Implementing the L2 norm using existing layers of caffe can save you all the hustle.
Here's one way to compute ||x1-x2||_2
in caffe for "bottom"s x1
and x2
(assuming x1
and x2
are B
-by-C
blobs, computing B
norms for C
dimensional diffs)
layer {
name: "x1-x2"
type: "Eltwise"
bottom: "x1"
bottom: "x1"
top: "x1-x2"
eltwise_param {
operation: SUM
coeff: 1 coeff: -1
}
}
layer {
name: "sqr_norm"
type: "Reduction"
bottom: "x1-x2"
top: "sqr_norm"
reduction_param { operation: SUMSQ axis: 1 }
}
layer {
name: "sqrt"
type: "Power"
bottom: "sqr_norm"
top: "sqrt"
power_param { power: 0.5 }
}
For the triplet loss defined in the paper, you need to compute L2 norm for x-x+
and for x-x-
, concat these two blobs and feed the concat blob to a "Softmax"
layer.
No need for dirty gradient computations.
回答2:
This is a math question, but here it goes. The first equation is what you're used to, and the second is what you do when it's not squared.
来源:https://stackoverflow.com/questions/36277060/gradient-calculation-for-softmax-version-of-triplet-loss