问题
I'm using Tensorflow 2.0 / Keras to generate building footprints for a procedural city.
I would like to feed in a source image representing a city 'block', and have the network return the predicted building foundations.
Input / Output
I've had some success using a pix2pix-like GAN architecture, but the output would ideally be a vector image rather than a bitmap.
I've therefore been trying to implement a different model, where the input image is convolved down to a 32x32 tensor, converted to a .svg file, rendered to a canvas, and then read back in as a bitmap image.
INPUT IMAGE -> CNN -> TENSOR -> SVG -> PNG -> TENSOR -> DESCRIMINATOR
[256,256,3] -> [32,32,3] -> temp.svg -> temp.png -> [256,256,3]
Like this
I have the target image as both a bitmap and a .svg file, so it would be possible to convert the .svg directly to a representative tensor, and train the network as a seq2seq problem. However, this would ignore the spatial relationships between 2D images, rather than a 'blind' 1D sequence.
I've therefore been trying to wrap the tensor -> svg -> png -> tensor conversion inside a custom keras layer.
This is the (rather naive) code that converts a [32,32,3] tensor to a .svg file:
#converts the tensor to a 3d array, and then to an svg using svgwrite
def convert_to_svg(input_tensor):
array = input_tensor
d = svgwrite.Drawing('temp.svg', size = ('256px', '256px'), viewBox = '0 0 256 256')
for i in range(32):
first_value = array[i,0,0]
if first_value < 0:
continue
#slice only the red channel from the i line, multiply by 255
red_array = array[i,:,0]*255
#slice only the green channel, multiply by 255
green_array = array[i,:,1]*255
#combine and flatten them
combined_array = np.dstack((red_array, green_array)).flatten()
#remove the first two and last two indices of the combined array
index = [0,1,62,63]
clipped_array = np.delete(combined_array,index)
#filter array to remove values less than 0
filtered = clipped_array > 0
filtered_array = clipped_array[filtered]
#check array has an even number of values, delete the first index if it doesn't
if len(filtered_array) % 2 == 0:
print ()
else:
filtered_array = np.delete(filtered_array,0)
#convert into a set of tuples
l = filtered_array.tolist()
t = list(zip(l, l[1:] + l[:1]))
if not t:
continue
#add line to the svg
d.add(d.polygon(points=t,fill=svgwrite.rgb((first_value)*255, (first_value*255), (first_value*255))))
d.save()
return(d)
This converts the .svg to a .png, and then reads the resulting image as a tensor:
def convert_to_tensor(input_svg):
mem=io.BytesIO()
drawing = svg2rlg(input_svg)
renderPM.drawToFile(drawing, mem, fmt='PNG')
array = np.array(Image.open(mem))
array = (array/127.5)-1
tensor = tf.convert_to_tensor(array, tf.float32)
return(tensor)
The two are combined in a function:
#Take the generator output and convert it to a .svg, then to a .png, then to a tensor.
def post_process (input_pp):
svg = convert_to_svg(input_pp[0,...])
temp_svg = open('temp.svg')
t = convert_to_tensor(temp_svg)
return(t)
Wrapped in a tf.py_function:
@tf.function
def tf_function(input):
y = tf.py_function(post_process, [input], tf.float32)
y.set_shape([None, 256, 256, 3])
return y
And then called as a Lambda layer:
def lastlayer():
result = tf.keras.Sequential()
result.add(tf.keras.layers.Lambda(tf_function, output_shape=[256,256,3]))
result.trainable = False
return result
The model then looks like this:
def Generator():
inputs = tf.keras.layers.Input(shape=[256,256,3])
x = downsample(64, 4, apply_batchnorm=False)(inputs) # (bs, 128, 128, 64)
x = downsample(128, 4)(x) # (bs, 64, 64, 128)
x = downsample(256, 4)(x) # (bs, 32, 32, 256)
x = downsample(512, 4)(x) # (bs, 16, 16, 512)
x = downsample(512, 4)(x) # (bs, 8, 8, 512)
x = downsample(512, 4)(x) # (bs, 4, 4, 512)
x = downsample(512, 4)(x) # (bs, 2, 2, 512)
x = downsample(512, 4)(x) # (bs, 1, 1, 512)
x = upsample(512, 4, apply_dropout=True)(x) # (bs, 2, 2, 512)
x = upsample(512, 4, apply_dropout=True)(x) # (bs, 4, 4, 1512)
x = upsample(512, 4, apply_dropout=True)(x) # (bs, 8, 8, 512)
x = upsample(512, 4)(x) # (bs, 16, 16, 512)
x = upsample(3, 4)(x) # (bs, 32, 32, 3)
x = lastlayer()(x)
return tf.keras.Model(inputs=inputs, outputs=x)
The network compiles without issue, but I've not been able to get it to train. The two most common errors I've seen are:
TypeError: Tensor is unhashable. Instead, use tensor.ref() as the key.
Or:
ValueError: No gradients provided for any variable
The reason I'm not posting a specific error stack (though I could easily do this) is I think it's a question about the overall architecture.
Is what I've described actually possible? Is there a different way to do this sort of image manipulation within the graph itself? The function is obviously not differentiable, isn't trainable, and won't have a gradient - but I thought that's what the Lambda layer / py_function was designed to accommodate.
Many thanks for any advice
来源:https://stackoverflow.com/questions/61862256/using-tensorflow-keras-image-manipulation-inside-tf-py-function