Having two tensors :inputs_tokens is a batch of 20x300 of token ids and seq_A is my model output with size of [20, 300, 512] (512 vector for each of the tokens in the batch)