问题
Imagine a typical auto-encoder-decoder model. However, instead of a general decoder where deconvoutions together with upscaling are used to create/synthesize a tensor similar to the model's input, I need to implement a structured/custom decoder.
Here, I need the decoder to take its input, e.g. a 10x2 tensor where each row represents x,y
positions or coordinates, and render a fixed predefined size image where there are 10 gaussian distributions generated at the location specified by the input.
In another way, I need to create an empty fixed sized tensor, fill the locations specified by the 10 coordinates a value 1, and then sweep a gaussian kernel over the whole tensor. For example, imagine the following 1-d scenario. let the input to the whole model be a vector of size 10. if the input to the decoder is [3, 7]
, which are two x-coordinates (0-indexing), and the gaussian kernel of size 3 that we want to use is [0.28, 0.44, 0.28]
, then the output of the decoder should look like the following (should be the same size as the original input of the model which is 10):
[0, 0, 0.28, 0.44, 0.28, 0, 0.28, 0.44, 0.28, 0]
which is the same as [0, 0, 0, 1, 0, 0, 0, 1, 0, 0]*[0.28, 0.44, 0.28]
where *
represents the convolution operator. please note that in the first vector, the 1 or located at positions 3 and 7 considering a 0-indexing format.
Finally a typical pixel loss such as MSE will be calculated. The important part is that this rendering module needs to be able to backpropagate the errors from the loss to its inputs which are the coordinates.
This module itself does not have any trainable parameters. Also, I do not want to change the layers coming before this rendering module and they need to stay as they are. In a more advanced setting, I would also like to provide the 4 covariance values as input too, i.e. the input to the renderer would be in the form of [num_points, 5]
where each row is [x_coord, y_coord, cov(x,x), cov(x,y), cov(y,y)]
.
How can I implement such a module in any of the available deep learning frameworks? a hint towards something similar would also be very useful.
回答1:
In my experience, punctual things in neural networks will have a bad performance because it cuts the influence of distant pixels.
Thus, instead of using a gaussian kernel, it would be better to have an actual gaussian function applied to all pixels.
So, taking a 2D gaussian distribution function:
We can use it like this:
This means some steps in a custom function:
import keras.backend as K
def coords_to_gaussian(x): #where x is shape (batch, 10, 2), and 2 = x, y
#pixel coordinates - must match the values of x and y
#here I suppose from 0 to image size, but you may want it normalized, maybe
x_pixels = K.reshape(K.arange(image_size), (1,1,image_size,1))
x_pixels = K.concatenate([x_pixels]*image_size, axis=-1) #shape(1,1,size,size)
y_pixels = K.permute_dimensions(x_pixels, (0,1,3,2))
pixels = K.stack([x_pixels, y_pixels], axis=-1) #shape(1,1,size,size,2)
#adjusting the AE locations to a compatible shape:
locations = K.reshape(x, (-1, 10, 1, 1, 2))
#calculating the upper part of the equation
result = K.square(pixels - locations) #shape (batch, 10, size, size, 2)
result = - K.sum(result, axis=-1) / (2*square_sigma) #shape (batch, 10, size, size)
#calculating the E:
result = K.exp(result) / (2 * pi * square_sigma)
#sum the 10 channels (principle of superposition)
result = K.sum(result, axis=1) #shape (batch, size, size)
#add a channel for future convolutions
result = K.expand_dims(result, axis=-1) #shape (batch, size, size, 1)
return result
Use this in a Lambda
layer:
from keras.layers import Lambda
Lambda(coords_to_gaussian)(coordinates_tensor_from_encoder)
I'm not considering the covariances here, but you might find a way to put them in the formulas and adjust the code.
来源:https://stackoverflow.com/questions/60082775/how-to-implement-a-gaussian-renderer-with-mean-and-variance-values-as-input-in-a