I am relatively new to DL and Keras.
I am trying to implement perceptual loss using the pretrained VGG16 in Keras but have some troubles. I already found that question b
Well, the first problem is significant.
VGG models were made to color images with 3 channels... so, it's quite not the right model for your case. I'm not sure if there are models for black & white images, but you should search for them.
A workaround for that, which I don't know if will work well, is to make 3 copies of mainModel
's output.
tripleOut = Concatenate()([mainModel.output,mainModel.output,mainModel.output])
This means that nowhere in your code, you created a connection between the input and output of fullModel
. You must connect the output of mainModel
to the input of lossModel
But first, let's prepare the VGG model for multiple outputs.
lossModel
for multiple outputsYou must select which layers of the VGG model will be used to calculate the loss. If you use only the final output there won't be really a good perceptual loss because the final output is made more of concepts than of features.
So, after you select the layers, make a list of their indices or names:
selectedLayers = [1,2,9,10,17,18] #for instance
Let's make a new model from VGG16, but with multiple outputs:
#a list with the output tensors for each selected layer:
selectedOutputs = [lossModel.layers[i].output for i in selectedLayers]
#or [lossModel.get_layer(name).output for name in selectedLayers]
#a new model that has multiple outputs:
lossModel = Model(lossModel.inputs,selectedOutputs)
Now, here we create the connection between the two models.
We call the lossModel
(as if it were a layer) taking the output of the mainModel
as input:
lossModelOutputs = lossModel(tripleOut) #or mainModel.output if not using tripeOut
Now, with the graph entirely connected from the input of mainModel to the output of lossModel, we can create the fullModel:
fullModel = Model(mainModel.input, lossModelOutputs)
#if the line above doesn't work due to a type problem, make a list with lossModelOutputs:
lossModelOutputs = [lossModelOutputs[i] for i in range(len(selectedLayers))]
Take the predictions of this new lossModel
, just as you did. But for the workaround, let's make it triple channel as well:
triple_Y_train = np.concatenate((Y_train,Y_train,Y_train),axis=-1)
Y_train_lossModel = lossModel.predict(triple_Y_train)
#the output will be a list of numpy arrays, one for each of the selected layers
Make sure you make each layer of lossModel
non trainable before fullModel.compile()
.
If you want 'mse' for all outputs, you just do:
fullModel.compile(loss='mse', ...)
If you want a different loss for each layer, pass a list of losses:
fullModel.compile(loss=[loss1,loss2,loss3,...], ...)
Since VGG is supposed to work with images in the caffe format, you might want to add a few layers after mainModel
to make the output suitable. It's not absolutely required, but it would use the best performance from VGG.
See how keras transforms an input image ranging from 0 to 255 into a caffe format here at line 15 or 44