I have been trying to implement the softmax version of the triplet loss in Caffe described in
Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.
Implementing the L2 norm using existing layers of caffe can save you all the hustle.
Here's one way to compute ||x1-x2||_2
in caffe for "bottom"s x1
and x2
(assuming x1
and x2
are B
-by-C
blobs, computing B
norms for C
dimensional diffs)
layer {
name: "x1-x2"
type: "Eltwise"
bottom: "x1"
bottom: "x1"
top: "x1-x2"
eltwise_param {
operation: SUM
coeff: 1 coeff: -1
}
}
layer {
name: "sqr_norm"
type: "Reduction"
bottom: "x1-x2"
top: "sqr_norm"
reduction_param { operation: SUMSQ axis: 1 }
}
layer {
name: "sqrt"
type: "Power"
bottom: "sqr_norm"
top: "sqrt"
power_param { power: 0.5 }
}
For the triplet loss defined in the paper, you need to compute L2 norm for x-x+
and for x-x-
, concat these two blobs and feed the concat blob to a "Softmax"
layer.
No need for dirty gradient computations.