Matrix/Tensor Triple Product?

前端 未结 3 1426
一个人的身影
一个人的身影 2021-01-04 23:47

An algorithm I\'m working on requires computing, in a couple places, a type of matrix triple product.

The operation takes three square matrices with identical dimens

相关标签:
3条回答
  • 2021-01-05 00:20

    I know this is a bit old now but this topic comes up a lot. In Matlab it is hard to beat tprod, a MEX file written by Jason Farquhar available here

    https://www.mathworks.com/matlabcentral/fileexchange/16275-tprod-arbitary-tensor-products-between-n-d-arrays

    tprod works a lot like einsum though it is limited to a binary operation (2 tensors). This is probably not really a limitation because I suspect that einsum simply performs a series of binary operations. The order of these operations makes a big difference and my understanding is that einsum simply performs them in the order the arrays are passed and does not allow multiple intermediate products.

    tprod also is limited to dense (full) arrays. Kolda's Tensor Toolbox (mentioned in an earlier post) does support sparse tensors but is more limited in its functionality than tprod (it does not allow repeated indices in the output). I am working on filling these gaps but wouldn't it be nice if Mathworks did it?

    0 讨论(0)
  • 2021-01-05 00:41

    Let nxn be the matrix sizes. In Matlab, you can

    1. Group A and C into a n^2xn matrix AC, such that rows of AC correspond to all combinations of rows of A and C.
    2. Post-multiply AC by B. That gives the desired result, only in a different shape.
    3. Reshape and permute dimensions to get the result in the desired form.

    Code:

    AC = reshape(bsxfun(@times, permute(A, [1 3 2]), permute(C, [3 1 2])), n^2, n); % // 1
    X = permute(reshape((AC*B).', n, n, n), [2 1 3]);                               %'// 2, 3
    

    Check with a verbatim loop-based approach:

    %// Example data:
    n = 3;
    A = rand(n,n);
    B = rand(n,n);
    C = rand(n,n);
    
    %// Proposed approach:
    AC = reshape(bsxfun(@times, permute(A, [1 3 2]), permute(C, [3 1 2])), n^2, n);
    X = permute(reshape((AC*B).', n, n, n), [2 1 3]); %'
    
    %// Loop-based approach:
    Xloop = NaN(n,n,n); %// initiallize
    for ii = 1:n
        for jj = 1:n
            for kk = 1:n
                Xloop(ii,jj,kk) = sum(A(ii,:).*B(:,jj).'.*C(kk,:)); %'
            end
        end
    end
    
    %// Compute maximum relative difference:
    max(max(max(abs(X./Xloop-1))))
    
    ans =
        2.2204e-16
    

    The maximum relative difference is of the order of eps, so the result is correct to within numerical precision.

    0 讨论(0)
  • 2021-01-05 00:45

    Introduction and Solution Code

    np.einsum, is really hard to beat, but in rare cases, you can still beat it, if you can bring in matrix-multiplication into the computations. After few trials, it seems you can bring in matrix-multiplication with np.dot to surpass the performance with np.einsum('ia,aj,ka->ijk', A, B, C).

    The basic idea is that we break down the "all einsum" operation into a combination of np.einsum and np.dot as listed below:

    • The summations for A:[i,a] and B:[a,j] are done with np.einsum to get us a 3D array:[i,j,a].
    • This 3D array is then reshaped into a 2D array:[i*j,a] and the third array, C[k,a] is transposed to [a,k], with the intention of performing matrix-multiplication between these two, giving us [i*j,k] as the matrix product, as we lose the index [a] there.
    • The product is reshaped into a 3D array:[i,j,k] for the final output.

    Here's the implementation for the first version discussed so far -

    import numpy as np
    
    def tensor_prod_v1(A,B,C):   # First version of proposed method
        # Shape parameters
        m,d = A.shape
        n = B.shape[1]
        p = C.shape[0]
        
        # Calculate \sum_a A[i,a] B[a,j] to get a 3D array with indices as (i,j,a)
        AB = np.einsum('ia,aj->ija', A, B)
        
        # Calculate entire summation losing a-ith index & reshaping to desired shape
        return np.dot(AB.reshape(m*n,d),C.T).reshape(m,n,p)
    

    Since we are summing the a-th index across all three input arrays, one can have three different methods to sum along the a-th index. The code listed earlier was for (A,B). Thus, we can also have (A,C) and (B,C) giving us two more variations, as listed next:

    def tensor_prod_v2(A,B,C):
        # Shape parameters
        m,d = A.shape
        n = B.shape[1]
        p = C.shape[0]
        
        # Calculate \sum_a A[i,a] C[k,a] to get a 3D array with indices as (i,k,a)
        AC = np.einsum('ia,ja->ija', A, C)
        
        # Calculate entire summation losing a-ith index & reshaping to desired shape
        return np.dot(AC.reshape(m*p,d),B).reshape(m,p,n).transpose(0,2,1)
        
    def tensor_prod_v3(A,B,C):
        # Shape parameters
        m,d = A.shape
        n = B.shape[1]
        p = C.shape[0]
        
        # Calculate \sum_a B[a,j] C[k,a] to get a 3D array with indices as (a,j,k)
        BC = np.einsum('ai,ja->aij', B, C)
        
        # Calculate entire summation losing a-ith index & reshaping to desired shape
        return np.dot(A,BC.reshape(d,n*p)).reshape(m,n,p)
    

    Depending upon the shapes of the input arrays, different approaches would yield different speedups with respect to each other, but we are hopeful that all would be better than the all-einsum approach. The performance numbers are listed in the next section.

    Runtime Tests

    This is probably the most important section, as we try to look into the speedup numbers with the three variations of the proposed approach over the all-einsum approach as originally proposed in the question.

    Dataset #1 (Equal shaped arrays) :

    In [494]: L1 = 200
         ...: L2 = 200
         ...: L3 = 200
         ...: al = 200
         ...: 
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...: 
    
    In [495]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...: 
    1 loops, best of 3: 470 ms per loop
    1 loops, best of 3: 391 ms per loop
    1 loops, best of 3: 446 ms per loop
    1 loops, best of 3: 3.59 s per loop
    

    Dataset #2 (Bigger A) :

    In [497]: L1 = 1000
         ...: L2 = 100
         ...: L3 = 100
         ...: al = 100
         ...: 
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...: 
    
    In [498]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...: 
    1 loops, best of 3: 442 ms per loop
    1 loops, best of 3: 355 ms per loop
    1 loops, best of 3: 303 ms per loop
    1 loops, best of 3: 2.42 s per loop
    

    Dataset #3 (Bigger B) :

    In [500]: L1 = 100
         ...: L2 = 1000
         ...: L3 = 100
         ...: al = 100
         ...: 
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...: 
    
    In [501]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...: 
    1 loops, best of 3: 474 ms per loop
    1 loops, best of 3: 247 ms per loop
    1 loops, best of 3: 439 ms per loop
    1 loops, best of 3: 2.26 s per loop
    

    Dataset #4 (Bigger C) :

    In [503]: L1 = 100
         ...: L2 = 100
         ...: L3 = 1000
         ...: al = 100
         ...: 
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
    
    In [504]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...: 
    1 loops, best of 3: 250 ms per loop
    1 loops, best of 3: 358 ms per loop
    1 loops, best of 3: 362 ms per loop
    1 loops, best of 3: 2.46 s per loop
    

    Dataset #5 (Bigger a-th dimension length) :

    In [506]: L1 = 100
         ...: L2 = 100
         ...: L3 = 100
         ...: al = 1000
         ...: 
         ...: A = np.random.rand(L1,al)
         ...: B = np.random.rand(al,L2)
         ...: C = np.random.rand(L3,al)
         ...: 
    
    In [507]: %timeit tensor_prod_v1(A,B,C)
         ...: %timeit tensor_prod_v2(A,B,C)
         ...: %timeit tensor_prod_v3(A,B,C)
         ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
         ...: 
    1 loops, best of 3: 373 ms per loop
    1 loops, best of 3: 269 ms per loop
    1 loops, best of 3: 299 ms per loop
    1 loops, best of 3: 2.38 s per loop
    

    Conclusions: We are seeing a speedup of 8x-10x with the variations of the proposed approach over the all-einsum approach listed in the question.

    0 讨论(0)
提交回复
热议问题