Direct way to generate sum of all parallel diagonals in Numpy / Pandas?

后端 未结 3 1398
深忆病人
深忆病人 2021-02-14 12:34

I have a rectangular (can\'t be assumed to be square) Pandas DataFrame of numbers. Say I pick a diagonal direction (either \"upperleft to lowerright\" or \"upperright to lowerl

3条回答
  •  执笔经年
    2021-02-14 13:29

    Short answer

    See the fast, but complicated function at the end.

    development

    Iteration over the trace is good, but I'm not sure it is better than the the pandas solution. Both involve iteration - over diagonals or columns. Conceptually it is simpler or cleaner, but I'm not sure about speed, especially on large arrays.

    Each diagonal has a different length, [[12],[9,13],...]. That is a big red flag, warning us that a block array operation is difficult if not impossible.

    With scipy.sparse I can construct a 2d array that can be summed to give these traces:

    In [295]: from scipy import sparse
    In [296]: xs=sparse.dia_matrix(x)
    In [297]: xs.data
    Out[297]: 
    array([[12,  0,  0],
           [ 9, 13,  0],
           [ 6, 10, 14],
           [ 3,  7, 11],
           [ 0,  4,  8],
           [ 0,  1,  5],
           [ 0,  0,  2]])
    In [298]: np.sum(xs.data,axis=1)
    Out[298]: array([12, 22, 30, 21, 12,  6,  2])
    

    This sparse format stores its data in a 2d array, with the necessary shifts. In fact your pd.concat produces something similar:

    In [304]: pd.concat([rectdf.iloc[:, i].shift(-i) for i in range(rectdf.shape[1])], axis=1)
    Out[304]: 
        0   1   2
    0   0   4   8
    1   3   7  11
    2   6  10  14
    3   9  13 NaN
    4  12 NaN NaN
    

    It looks like sparse creates this data array by starting with a np.zeros, and filling it with appropriate indexing:

     data[row_indices, col_indices] = x.ravel()
    

    something like:

    In [344]: i=[4,5,6,3,4,5,2,3,4,1,2,3,0,1,2]
    In [345]: j=[0,1,2,0,1,2,0,1,2,0,1,2,0,1,2]
    In [346]: z=np.zeros((7,3),int)
    In [347]: z[i,j]=x.ravel()[:len(i)]
    In [348]: z
    Out[348]: 
    array([[12,  0,  0],
           [ 9, 13,  0],
           [ 6, 10, 14],
           [ 3,  7, 11],
           [ 0,  4,  8],
           [ 0,  1,  5],
           [ 0,  0,  2]])
    

    though I still need a way of creating i,j for any shape. For j it is easy:

    j=np.tile(np.arange(3),5)
    j=np.tile(np.arange(x.shape[1]),x.shape[0])
    

    Reshaping i

    In [363]: np.array(i).reshape(-1,3)
    Out[363]: 
    array([[4, 5, 6],
           [3, 4, 5],
           [2, 3, 4],
           [1, 2, 3],
           [0, 1, 2]])
    

    leads me to recreating it with:

    In [371]: ii=(np.arange(3)+np.arange(5)[::-1,None]).ravel()
    In [372]: ii
    Out[372]: array([4, 5, 6, 3, 4, 5, 2, 3, 4, 1, 2, 3, 0, 1, 2])
    

    So together:

    def all_traces(x):
        jj = np.tile(np.arange(x.shape[1]),x.shape[0])
        ii = (np.arange(x.shape[1])+np.arange(x.shape[0])[::-1,None]).ravel()
        z = np.zeros(((x.shape[0]+x.shape[1]-1),x.shape[1]),int)
        z[ii,jj] = x.ravel()
        return z.sum(axis=1)
    

    It needs more testing over a variety of shapes.

    This function is faster than the iteration over traces, even with this small size array:

    In [387]: timeit all_traces(x)
    10000 loops, best of 3: 70.5 µs per loop
    In [388]: timeit [np.trace(x,i) for i in range(-(x.shape[0]-1),x.shape[1])]
    10000 loops, best of 3: 106 µs per loop
    

提交回复
热议问题