Nevermind, I figured it out. np.swapaxes(1, 2)
was the missing piece I needed.
The answer is just to do mat.swapaxes(1, 2).reshape(N*Q, N*Q)
.
Feel foolish for posting without attempting to figure it out myself for too long, but I'll leave it up so others can benefit from it.