I have a large 2D array K (its size ~5000x5000). I need to find the sums of all submatrices M[i:j, i:j] for all pairs (i,j). Unfortunately, the use of two loops works too slow.