I have written the following convolutional neural network (CNN) class in Tensorflow [I have tried to omit some lines of code for clarity.]
class CNN:
de
I encountered the same problem and could not solve the problem (without retraining) with any solution i found on the internet. So what I did is load each model in two separate threads which communicate with the main thread. It is simple enough to write the code, you just have to be careful when you synchronize the threads. In my case each thread received the input for its problem and returned to the main thread the output. It works without any observable overhead.