Efficiently construct FEM/FVM matrix

后端未结

关注

 2  723

This is a typical use case for FEM/FVM equation systems, so is perhaps of broader interest. From a triangular mesh à la

I would like to create a scipy


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  既然无缘        
                
              
                            
                2021-01-13 20:28
              
            
            
                                                                       
So, in the end this turned out to be the difference between COO's and CSR's sum_duplicates (just like @hpaulj suspected). Thanks to the efforts of everyone involved here (particularly @paul-panzer), a PR is underway to give tocsr a tremendous speedup.

SciPy's tocsr does a lexsort on (I, J), so it helps organizing the indices in such a way that (I, J) will come out fairly sorted already.

For for nx=4, ny=2 in the above example, I and J are

[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]


First sorting each row of cells, then the rows by the first column like



cells = numpy.sort(cells, axis=1)
cells = cells[cells[:, 0].argsort()]


produces

[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]


For the number in the original post, sorting cuts down the runtime from about 40 seconds to 8 seconds.

Perhaps an even better ordering can be achieved if the nodes are numbered more appropriately in the first place. I'm thinking of Cuthill-McKee and friends.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  隐瞒了意图╮        
                
              
                            
                2021-01-13 20:41
              
            
            
                                                                       
I would try creating the csr structure directly, especially if you are resorting to np.unique since this gives you sorted keys, which is half the job done.

I'm assuming you are at the point where you have i, j sorted lexicographically and overlapping v summed using np.add.at on the optional inverse output of np.unique.

Then v and j are already in csr format. All that's left to do is creating the indptr which you simply get by np.searchsorted(i, np.arange(M+1)) where M is the column length. You can pass these directly to the sparse.csr_matrix constructor.

Ok, let code speak:

import numpy as np
from scipy import sparse
from timeit import timeit


def tocsr(I, J, E, N):
    n = len(I)
    K = np.empty((n,), dtype=np.int64)
    K.view(np.int32).reshape(n, 2).T[...] = J, I  
    S = np.argsort(K)
    KS = K[S]
    steps = np.flatnonzero(np.r_[1, np.diff(KS)])
    ED = np.add.reduceat(E[S], steps)
    JD, ID = KS[steps].view(np.int32).reshape(-1, 2).T
    ID = np.searchsorted(ID, np.arange(N+1))
    return sparse.csr_matrix((ED, np.array(JD, dtype=int), ID), (N, N))

def viacoo(I, J, E, N):
    return sparse.coo_matrix((E, (I, J)), (N, N)).tocsr()


#testing and timing

# correctness
N = 1000
A = np.random.random((N, N)) < 0.001
I, J = np.where(A)

E = np.random.random((2, len(I)))
D = np.zeros((2,) + A.shape)
D[:, I, J] = E
D2 = tocsr(np.r_[I, I], np.r_[J, J], E.ravel(), N).A

print('correct:', np.allclose(D.sum(axis=0), D2))

# speed
N = 100000
K = 10

I, J = np.random.randint(0, N, (2, K*N))
E = np.random.random((2 * len(I),))
I, J, E = np.r_[I, I, J, J], np.r_[J, J, I, I], np.r_[E, E]

print('N:', N, ' --  nnz (with duplicates):', len(E))
print('direct: ', timeit('f(a,b,c,d)', number=10, globals={'f': tocsr, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
print('via coo:', timeit('f(a,b,c,d)', number=10, globals={'f': viacoo, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')


Prints:

correct: True
N: 100000  --  nnz (with duplicates): 4000000
direct:  7.702431229001377 secs for 10 iterations
via coo: 41.813509466010146 secs for 10 iterations


Speedup: 5x
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复