I am new to Keras. I need some help in writing a custom loss function in keras with TensorFlow backend for the following loss equation.
Recent versions of Keras actually support losses with different shapes of y_pred
and y_true
. The build in loss sparse_categorical_crossentropy
is an example of this. The TensorFlow implementation of this loss is here: https://github.com/keras-team/keras/blob/0fc33feb5f4efe3bb823c57a8390f52932a966ab/keras/backend/tensorflow_backend.py#L3570
Notice how it says target: An integer tensor.
and not target: A tensor of the same shape as `output`.
like the others. I tried with a custom loss I made myself and it seems to work fine.
I'm using Keras 2.2.4.
You can pretty much just translate the numpy functions into Keras backend functions. The only thing to notice is to set up the right broadcast shape.
def l2_loss_keras(y_true, y_pred):
# set up meshgrid: (height, width, 2)
meshgrid = K.tf.meshgrid(K.arange(im_height), K.arange(im_width))
meshgrid = K.cast(K.transpose(K.stack(meshgrid)), K.floatx())
# set up broadcast shape: (batch_size, height, width, num_joints, 2)
meshgrid_broadcast = K.expand_dims(K.expand_dims(meshgrid, 0), -2)
y_true_broadcast = K.expand_dims(K.expand_dims(y_true, 1), 2)
diff = meshgrid_broadcast - y_true_broadcast
# compute loss: first sum over (height, width), then take average over num_joints
ground = K.exp(-0.5 * K.sum(K.square(diff), axis=-1) / sigma ** 2)
loss = K.sum(K.square(ground - y_pred), axis=[1, 2])
return K.mean(loss, axis=-1)
To verify it:
def l2_loss_numpy(y_true, y_pred):
loss = 0
n = y_true.shape[0]
for j in range(n):
for i in range(num_joints):
yv, xv = np.meshgrid(np.arange(0, im_height), np.arange(0, im_width))
z = np.stack([xv, yv]).transpose(1, 2, 0)
ground = np.exp(-0.5*(((z - y_true[j, i, :])**2).sum(axis=2))/(sigma**2))
loss = loss + np.sum((ground - y_pred[j,:, :, i])**2)
return loss/num_joints
batch_size = 32
num_joints = 10
sigma = 5
im_width = 256
im_height = 256
y_true = 255 * np.random.rand(batch_size, num_joints, 2)
y_pred = 255 * np.random.rand(batch_size, im_height, im_width, num_joints)
print(l2_loss_numpy(y_true, y_pred))
45448272129.0
print(K.eval(l2_loss_keras(K.variable(y_true), K.variable(y_pred))).sum())
4.5448e+10
The number is truncated under the default dtype
float32. If you run it with dtype
set to float64:
y_true = 255 * np.random.rand(batch_size, num_joints, 2)
y_pred = 255 * np.random.rand(batch_size, im_height, im_width, num_joints)
print(l2_loss_numpy(y_true, y_pred))
45460126940.6
print(K.eval(l2_loss_keras(K.variable(y_true), K.variable(y_pred))).sum())
45460126940.6
EDIT:
It seems that Keras requires y_true
and y_pred
to have the same number of dimensions. For example, on the following testing model:
X = np.random.rand(batch_size, 256, 256, 3)
model = Sequential([Dense(10, input_shape=(256, 256, 3))])
model.compile(loss=l2_loss_keras, optimizer='adam')
model.fit(X, y_true, batch_size=8)
ValueError: Cannot feed value of shape (8, 10, 2) for Tensor 'dense_2_target:0', which has shape '(?, ?, ?, ?)'
To deal with this problem, you can add a dummy dimension with expand_dims
before feeding y_true
into the model:
def l2_loss_keras(y_true, y_pred):
...
y_true_broadcast = K.expand_dims(y_true, 1) # change this line
...
model.fit(X, np.expand_dims(y_true, axis=1), batch_size=8)
Yu's answer is correct still I want to share my experience. Whenever you want to write custom loss function, beware of certain things:
y_pred
which comes from the output layer of your model has 3-D shape, y_true
will default to 3-D shape. BUT, at run time i.e. during fit
, if you pass target data as 2-D which many people do, you might end up getting the error in your loss function. E.g., if you are calculating sigmoid_crossentropy_with_logits
, it will complain. Hence, do pass targets as 3-D via np.expand_dims. y_true
and y_pred
as arguments (if somebody knows that we can use any other names as arguments, pl shout).