I am using NASNetLarge as CNN encoder for image captioning. What should be the size of LSTM hidden dim? Thanks.