My program works properly with two GPUs but raise this error when reached to torch.bmm
energy = torch.bmm(proj_query,proj_key) # transpose check RuntimeError: cublas