I\'d like to transpose a matrix b using MPI_Alltoallv and store it in bt.
Each process contain nlocal rows of b. For example:
Proc0: 0 | 10 | 20 | 30