I have a network that produces a 4D output tensor where the value at each position in spatial dimensions (~pixel) is to be interpreted as the class probabilities for that po
Just flatten the output to a 2D tensor of size (num_batches, height * width * num_classes)
. You can do this with the Flatten
layer. Ensure that your y
is flattened the same way (normally calling y = y.reshape((num_batches, height * width * num_classes))
is enough).
For your second question, using categorical crossentropy over all width*height
predictions is essentially the same as averaging the categorical crossentropy for each width*height
predictions (by the definition of categorical crossentropy).