Gradient calculation for softmax version of triplet loss

问题

I have been trying to implement the softmax version of the triplet loss in Caffe described in
Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.

I have tried this but I am finding it hard to calculate the gradient as the L2 in exponent is not squared.

Can someone please help me here?

回答1:

Implementing the L2 norm using existing layers of caffe can save you all the hustle.

Here's one way to compute ||x1-x2||_2 in caffe for "bottom"s x1 and x2 (assuming x1 and x2 are B-by-C blobs, computing B norms for C dimensional diffs)

layer {
  name: "x1-x2"
  type: "Eltwise"
  bottom: "x1"
  bottom: "x1"
  top: "x1-x2"
  eltwise_param { 
    operation: SUM
    coeff: 1 coeff: -1
  }
}
layer {
  name: "sqr_norm"
  type: "Reduction"
  bottom: "x1-x2"
  top: "sqr_norm"
  reduction_param { operation: SUMSQ axis: 1 }
}
layer {
  name: "sqrt"
  type: "Power"
  bottom: "sqr_norm"
  top: "sqrt"
  power_param { power: 0.5 }
}

For the triplet loss defined in the paper, you need to compute L2 norm for x-x+ and for x-x-, concat these two blobs and feed the concat blob to a "Softmax" layer.
No need for dirty gradient computations.