Implement perceptual loss with pretrained VGG using keras

前端 未结 1 574
温柔的废话
温柔的废话 2021-02-09 13:24

I am relatively new to DL and Keras.

I am trying to implement perceptual loss using the pretrained VGG16 in Keras but have some troubles. I already found that question b

1条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-02-09 14:03

    Number of channels

    Well, the first problem is significant.

    VGG models were made to color images with 3 channels... so, it's quite not the right model for your case. I'm not sure if there are models for black & white images, but you should search for them.

    A workaround for that, which I don't know if will work well, is to make 3 copies of mainModel's output.

    tripleOut = Concatenate()([mainModel.output,mainModel.output,mainModel.output])
    

    Graph disconnected

    This means that nowhere in your code, you created a connection between the input and output of fullModel. You must connect the output of mainModel to the input of lossModel

    But first, let's prepare the VGG model for multiple outputs.

    Preparing lossModel for multiple outputs

    You must select which layers of the VGG model will be used to calculate the loss. If you use only the final output there won't be really a good perceptual loss because the final output is made more of concepts than of features.

    So, after you select the layers, make a list of their indices or names:

    selectedLayers = [1,2,9,10,17,18] #for instance
    

    Let's make a new model from VGG16, but with multiple outputs:

    #a list with the output tensors for each selected layer:
    selectedOutputs = [lossModel.layers[i].output for i in selectedLayers]
         #or [lossModel.get_layer(name).output for name in selectedLayers]
    
    #a new model that has multiple outputs:
    lossModel = Model(lossModel.inputs,selectedOutputs)
    

    Joining the models

    Now, here we create the connection between the two models.

    We call the lossModel (as if it were a layer) taking the output of the mainModel as input:

    lossModelOutputs = lossModel(tripleOut) #or mainModel.output if not using tripeOut
    

    Now, with the graph entirely connected from the input of mainModel to the output of lossModel, we can create the fullModel:

    fullModel = Model(mainModel.input, lossModelOutputs)
    
    #if the line above doesn't work due to a type problem, make a list with lossModelOutputs:
    lossModelOutputs = [lossModelOutputs[i] for i in range(len(selectedLayers))]
    

    Training

    Take the predictions of this new lossModel, just as you did. But for the workaround, let's make it triple channel as well:

    triple_Y_train = np.concatenate((Y_train,Y_train,Y_train),axis=-1)
    Y_train_lossModel = lossModel.predict(triple_Y_train)
    #the output will be a list of numpy arrays, one for each of the selected layers   
    

    Make sure you make each layer of lossModel non trainable before fullModel.compile().

    If you want 'mse' for all outputs, you just do:

    fullModel.compile(loss='mse', ...)
    

    If you want a different loss for each layer, pass a list of losses:

    fullModel.compile(loss=[loss1,loss2,loss3,...], ...)
    

    Additional considerations

    Since VGG is supposed to work with images in the caffe format, you might want to add a few layers after mainModel to make the output suitable. It's not absolutely required, but it would use the best performance from VGG.

    See how keras transforms an input image ranging from 0 to 255 into a caffe format here at line 15 or 44

    0 讨论(0)
提交回复
热议问题