Efficiently construct FEM/FVM matrix

后端 未结 2 721
花落未央
花落未央 2021-01-13 20:17

This is a typical use case for FEM/FVM equation systems, so is perhaps of broader interest. From a triangular mesh à la

I would like to create a scipy

2条回答
  •  隐瞒了意图╮
    2021-01-13 20:41

    I would try creating the csr structure directly, especially if you are resorting to np.unique since this gives you sorted keys, which is half the job done.

    I'm assuming you are at the point where you have i, j sorted lexicographically and overlapping v summed using np.add.at on the optional inverse output of np.unique.

    Then v and j are already in csr format. All that's left to do is creating the indptr which you simply get by np.searchsorted(i, np.arange(M+1)) where M is the column length. You can pass these directly to the sparse.csr_matrix constructor.

    Ok, let code speak:

    import numpy as np
    from scipy import sparse
    from timeit import timeit
    
    
    def tocsr(I, J, E, N):
        n = len(I)
        K = np.empty((n,), dtype=np.int64)
        K.view(np.int32).reshape(n, 2).T[...] = J, I  
        S = np.argsort(K)
        KS = K[S]
        steps = np.flatnonzero(np.r_[1, np.diff(KS)])
        ED = np.add.reduceat(E[S], steps)
        JD, ID = KS[steps].view(np.int32).reshape(-1, 2).T
        ID = np.searchsorted(ID, np.arange(N+1))
        return sparse.csr_matrix((ED, np.array(JD, dtype=int), ID), (N, N))
    
    def viacoo(I, J, E, N):
        return sparse.coo_matrix((E, (I, J)), (N, N)).tocsr()
    
    
    #testing and timing
    
    # correctness
    N = 1000
    A = np.random.random((N, N)) < 0.001
    I, J = np.where(A)
    
    E = np.random.random((2, len(I)))
    D = np.zeros((2,) + A.shape)
    D[:, I, J] = E
    D2 = tocsr(np.r_[I, I], np.r_[J, J], E.ravel(), N).A
    
    print('correct:', np.allclose(D.sum(axis=0), D2))
    
    # speed
    N = 100000
    K = 10
    
    I, J = np.random.randint(0, N, (2, K*N))
    E = np.random.random((2 * len(I),))
    I, J, E = np.r_[I, I, J, J], np.r_[J, J, I, I], np.r_[E, E]
    
    print('N:', N, ' --  nnz (with duplicates):', len(E))
    print('direct: ', timeit('f(a,b,c,d)', number=10, globals={'f': tocsr, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
    print('via coo:', timeit('f(a,b,c,d)', number=10, globals={'f': viacoo, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
    

    Prints:

    correct: True
    N: 100000  --  nnz (with duplicates): 4000000
    direct:  7.702431229001377 secs for 10 iterations
    via coo: 41.813509466010146 secs for 10 iterations
    

    Speedup: 5x

提交回复
热议问题