Efficiently construct FEM/FVM matrix

后端 未结 2 722
花落未央
花落未央 2021-01-13 20:17

This is a typical use case for FEM/FVM equation systems, so is perhaps of broader interest. From a triangular mesh à la

I would like to create a scipy

相关标签:
2条回答
  • 2021-01-13 20:28

    So, in the end this turned out to be the difference between COO's and CSR's sum_duplicates (just like @hpaulj suspected). Thanks to the efforts of everyone involved here (particularly @paul-panzer), a PR is underway to give tocsr a tremendous speedup.

    SciPy's tocsr does a lexsort on (I, J), so it helps organizing the indices in such a way that (I, J) will come out fairly sorted already.

    For for nx=4, ny=2 in the above example, I and J are

    [1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
    [1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
    

    First sorting each row of cells, then the rows by the first column like

    cells = numpy.sort(cells, axis=1)
    cells = cells[cells[:, 0].argsort()]
    

    produces

    [1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
    [1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
    

    For the number in the original post, sorting cuts down the runtime from about 40 seconds to 8 seconds.

    Perhaps an even better ordering can be achieved if the nodes are numbered more appropriately in the first place. I'm thinking of Cuthill-McKee and friends.

    0 讨论(0)
  • 2021-01-13 20:41

    I would try creating the csr structure directly, especially if you are resorting to np.unique since this gives you sorted keys, which is half the job done.

    I'm assuming you are at the point where you have i, j sorted lexicographically and overlapping v summed using np.add.at on the optional inverse output of np.unique.

    Then v and j are already in csr format. All that's left to do is creating the indptr which you simply get by np.searchsorted(i, np.arange(M+1)) where M is the column length. You can pass these directly to the sparse.csr_matrix constructor.

    Ok, let code speak:

    import numpy as np
    from scipy import sparse
    from timeit import timeit
    
    
    def tocsr(I, J, E, N):
        n = len(I)
        K = np.empty((n,), dtype=np.int64)
        K.view(np.int32).reshape(n, 2).T[...] = J, I  
        S = np.argsort(K)
        KS = K[S]
        steps = np.flatnonzero(np.r_[1, np.diff(KS)])
        ED = np.add.reduceat(E[S], steps)
        JD, ID = KS[steps].view(np.int32).reshape(-1, 2).T
        ID = np.searchsorted(ID, np.arange(N+1))
        return sparse.csr_matrix((ED, np.array(JD, dtype=int), ID), (N, N))
    
    def viacoo(I, J, E, N):
        return sparse.coo_matrix((E, (I, J)), (N, N)).tocsr()
    
    
    #testing and timing
    
    # correctness
    N = 1000
    A = np.random.random((N, N)) < 0.001
    I, J = np.where(A)
    
    E = np.random.random((2, len(I)))
    D = np.zeros((2,) + A.shape)
    D[:, I, J] = E
    D2 = tocsr(np.r_[I, I], np.r_[J, J], E.ravel(), N).A
    
    print('correct:', np.allclose(D.sum(axis=0), D2))
    
    # speed
    N = 100000
    K = 10
    
    I, J = np.random.randint(0, N, (2, K*N))
    E = np.random.random((2 * len(I),))
    I, J, E = np.r_[I, I, J, J], np.r_[J, J, I, I], np.r_[E, E]
    
    print('N:', N, ' --  nnz (with duplicates):', len(E))
    print('direct: ', timeit('f(a,b,c,d)', number=10, globals={'f': tocsr, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
    print('via coo:', timeit('f(a,b,c,d)', number=10, globals={'f': viacoo, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
    

    Prints:

    correct: True
    N: 100000  --  nnz (with duplicates): 4000000
    direct:  7.702431229001377 secs for 10 iterations
    via coo: 41.813509466010146 secs for 10 iterations
    

    Speedup: 5x

    0 讨论(0)
提交回复
热议问题