I\'m trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to b
The nan is a common issue in deep learning regression. Because you are using Siamese network, you can try followings:
It is not easy to make deep learning work perfectly.
I didn't run into the nan
issue, but my loss wouldn't change. I found this info
check this out
def cosine_distance(shapes):
y_true, y_pred = shapes
def l2_normalize(x, axis):
norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
return K.sign(x) * K.maximum(K.abs(x), K.epsilon()) / K.maximum(norm, K.epsilon())
y_true = l2_normalize(y_true, axis=-1)
y_pred = l2_normalize(y_pred, axis=-1)
return K.mean(1 - K.sum((y_true * y_pred), axis=-1))