Gradient calculation for softmax version of triplet loss

前端 未结 2 595
终归单人心
终归单人心 2021-01-15 15:19

I have been trying to implement the softmax version of the triplet loss in Caffe described in
Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.

相关标签:
2条回答
  • 2021-01-15 16:02

    This is a math question, but here it goes. The first equation is what you're used to, and the second is what you do when it's not squared.

    0 讨论(0)
  • 2021-01-15 16:04

    Implementing the L2 norm using existing layers of caffe can save you all the hustle.

    Here's one way to compute ||x1-x2||_2 in caffe for "bottom"s x1 and x2 (assuming x1 and x2 are B-by-C blobs, computing B norms for C dimensional diffs)

    layer {
      name: "x1-x2"
      type: "Eltwise"
      bottom: "x1"
      bottom: "x1"
      top: "x1-x2"
      eltwise_param { 
        operation: SUM
        coeff: 1 coeff: -1
      }
    }
    layer {
      name: "sqr_norm"
      type: "Reduction"
      bottom: "x1-x2"
      top: "sqr_norm"
      reduction_param { operation: SUMSQ axis: 1 }
    }
    layer {
      name: "sqrt"
      type: "Power"
      bottom: "sqr_norm"
      top: "sqrt"
      power_param { power: 0.5 }
    }
    

    For the triplet loss defined in the paper, you need to compute L2 norm for x-x+ and for x-x-, concat these two blobs and feed the concat blob to a "Softmax" layer.
    No need for dirty gradient computations.

    0 讨论(0)
提交回复
热议问题