I have been trying to implement the softmax version of the triplet loss in Caffe described in Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.
This is a math question, but here it goes. The first equation is what you're used to, and the second is what you do when it's not squared.