I realize now that implementing it like this would have been a good idea. However, I have an already trained and fine-tuned autoencoder that looks like this:
Mode