I\'m trying to build a multitask image captioning model, which contains of two separate encoder-decoder models with lstms, each of which takes inputs from different datasets, an