Using Tensorflow / Keras: Image manipulation inside tf.py_function

问题

I'm using Tensorflow 2.0 / Keras to generate building footprints for a procedural city.
I would like to feed in a source image representing a city 'block', and have the network return the predicted building foundations.

Input / Output

I've had some success using a pix2pix-like GAN architecture, but the output would ideally be a vector image rather than a bitmap.

I've therefore been trying to implement a different model, where the input image is convolved down to a 32x32 tensor, converted to a .svg file, rendered to a canvas, and then read back in as a bitmap image.

INPUT IMAGE -> CNN -> TENSOR -> SVG -> PNG -> TENSOR -> DESCRIMINATOR

[256,256,3] -> [32,32,3] -> temp.svg -> temp.png -> [256,256,3]

Like this

I have the target image as both a bitmap and a .svg file, so it would be possible to convert the .svg directly to a representative tensor, and train the network as a seq2seq problem. However, this would ignore the spatial relationships between 2D images, rather than a 'blind' 1D sequence.

I've therefore been trying to wrap the tensor -> svg -> png -> tensor conversion inside a custom keras layer.

This is the (rather naive) code that converts a [32,32,3] tensor to a .svg file:

#converts the tensor to a 3d array, and then to an svg using svgwrite
def convert_to_svg(input_tensor):
array = input_tensor
d = svgwrite.Drawing('temp.svg', size = ('256px', '256px'), viewBox = '0 0 256 256')
for i in range(32):
    first_value = array[i,0,0]
    if first_value < 0:
        continue  

    #slice only the red channel from the i line, multiply by 255
    red_array = array[i,:,0]*255

    #slice only the green channel, multiply by 255
    green_array = array[i,:,1]*255

    #combine and flatten them
    combined_array = np.dstack((red_array, green_array)).flatten()

    #remove the first two and last two indices of the combined array
    index = [0,1,62,63]
    clipped_array = np.delete(combined_array,index)

    #filter array to remove values less than 0
    filtered = clipped_array > 0
    filtered_array = clipped_array[filtered]

    #check array has an even number of values, delete the first index if it doesn't
    if len(filtered_array) % 2 == 0:  
        print ()
    else:
        filtered_array = np.delete(filtered_array,0)

    #convert into a set of tuples
    l = filtered_array.tolist()
    t = list(zip(l, l[1:] + l[:1]))

    if not t:
        continue

    #add line to the svg
    d.add(d.polygon(points=t,fill=svgwrite.rgb((first_value)*255, (first_value*255), (first_value*255))))

d.save()
return(d)

This converts the .svg to a .png, and then reads the resulting image as a tensor:

def convert_to_tensor(input_svg):
mem=io.BytesIO()
drawing = svg2rlg(input_svg)
renderPM.drawToFile(drawing, mem, fmt='PNG')
array =  np.array(Image.open(mem))
array = (array/127.5)-1
tensor = tf.convert_to_tensor(array, tf.float32)
return(tensor)

The two are combined in a function:

#Take the generator output and convert it to a .svg, then to a .png, then to a tensor.
def post_process (input_pp):
svg = convert_to_svg(input_pp[0,...])
temp_svg = open('temp.svg')
t = convert_to_tensor(temp_svg)
return(t)

Wrapped in a tf.py_function:

@tf.function
def tf_function(input):
y = tf.py_function(post_process, [input], tf.float32)
y.set_shape([None, 256, 256, 3])
return y

And then called as a Lambda layer:

def lastlayer():
result = tf.keras.Sequential()

result.add(tf.keras.layers.Lambda(tf_function, output_shape=[256,256,3]))

result.trainable = False

return result

The model then looks like this:

def Generator():
inputs = tf.keras.layers.Input(shape=[256,256,3])

x = downsample(64, 4, apply_batchnorm=False)(inputs) # (bs, 128, 128, 64)
x = downsample(128, 4)(x) # (bs, 64, 64, 128)
x = downsample(256, 4)(x) # (bs, 32, 32, 256)
x = downsample(512, 4)(x) # (bs, 16, 16, 512)
x = downsample(512, 4)(x) # (bs, 8, 8, 512)
x = downsample(512, 4)(x) # (bs, 4, 4, 512)
x = downsample(512, 4)(x) # (bs, 2, 2, 512)
x = downsample(512, 4)(x) # (bs, 1, 1, 512)



x = upsample(512, 4, apply_dropout=True)(x) # (bs, 2, 2, 512)
x = upsample(512, 4, apply_dropout=True)(x) # (bs, 4, 4, 1512)
x = upsample(512, 4, apply_dropout=True)(x) # (bs, 8, 8, 512)
x = upsample(512, 4)(x) # (bs, 16, 16, 512)
x = upsample(3, 4)(x) # (bs, 32, 32, 3)

x = lastlayer()(x)

return tf.keras.Model(inputs=inputs, outputs=x)

The network compiles without issue, but I've not been able to get it to train. The two most common errors I've seen are:

TypeError: Tensor is unhashable. Instead, use tensor.ref() as the key.

Or:

ValueError: No gradients provided for any variable

The reason I'm not posting a specific error stack (though I could easily do this) is I think it's a question about the overall architecture.

Is what I've described actually possible? Is there a different way to do this sort of image manipulation within the graph itself? The function is obviously not differentiable, isn't trainable, and won't have a gradient - but I thought that's what the Lambda layer / py_function was designed to accommodate.

Many thanks for any advice

来源：https://stackoverflow.com/questions/61862256/using-tensorflow-keras-image-manipulation-inside-tf-py-function

标签

python

tensorflow

machine-learning

svg

keras