Batch-major vs time-major LSTM
问题 Do RNNs learn different dependency patterns when the input is batch-major as opposed to time-major? 回答1: (Edit: sorry my initial argument was why it makes sense but I realized that it doesn't so this is a little OT.) I haven't found the TF-groups reasoning behind this but it does does not make computational sense as ops are written in C++. Intuitively, we want to mash up (multiply/add etc) different features from the same sequence on the same timestep. Different timesteps can’t be done in