How to optimize math operations on matrix in python

后端 未结 1 378
南旧
南旧 2021-01-19 05:40

I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I\'ve heard of numpy, but I really do not know how ap

相关标签:
1条回答
  • 2021-01-19 06:18

    In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s), or three times (if r!=s) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).

    import numpy as np
    
    distance_matrix_np = np.array(distance_matrix)
    stream_matrix_np = np.array(stream_matrix)
    n = 4
    
    def deltaC_np(r, s, sol):
        delta = 0
        sol_r, sol_s = sol[r], sol[s]
    
        K = np.array([i for i in xrange(n) if i!=r and i!=s])
    
        return np.sum(
            (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
            *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
            (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
            * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
    

    In this numpy implementation, rather than a for loop over the elements in K, the operations are applied across all the elements in K within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right. enter image description here

    This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]), so you were doing unnecessary computation, and your original code can be optimized without using numpy.

    It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension

    K = np.array([i for i in xrange(n) if i!=r and i!=s])
    

    Once this is replaced with vectorizing code

    if r==s:
        K=np.arange(n-1)
        K[r:] += 1
    else:
        K=np.arange(n-2)
        if r<s:
            K[r:] += 1
            K[s-1:] += 1
        else:
            K[s:] += 1
            K[r-1:] += 1
    

    the numpy function is much faster.

    A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.

    enter image description here

    The full code is below for reference, partly in case someone else can take it further. (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.)

    def deltaC(r, s, sol):
        delta = 0
        sol_r, sol_s = sol[r], sol[s]
        for k in xrange(n):
            if k != r and k != s:
                delta += \
                    stream_matrix[r][k] \
                    * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
                    stream_matrix[s][k] \
                    * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
                    stream_matrix[k][r] \
                    * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
                    stream_matrix[k][s] \
                    * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
        return delta
    
    import numpy as np
    
    def deltaC_np(r, s, sol):
        delta = 0
        sol_r, sol_s = sol[r], sol[s]
    
        if r==s:
            K=np.arange(n-1)
            K[r:] += 1
        else:
            K=np.arange(n-2)
            if r<s:
                K[r:] += 1
                K[s-1:] += 1
            else:
                K[s:] += 1
                K[r-1:] += 1
        #K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW
    
        return np.sum(
            (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
            *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
            (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
            * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
    
    def deltaC2(r, s, sol):
        delta = 0
        sol_r, sol_s = sol[r], sol[s]
        for k in xrange(n):
            if k != r and k != s:
                sol_k = sol[k]
                delta += \
                    (stream_matrix[r][k] - stream_matrix[s][k]) \
                    * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
                    + \
                    (stream_matrix[k][r] - stream_matrix[k][s]) \
                    * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
        return delta
    
    
    import time
    
    N=200
    
    elapsed1s = []
    elapsed2s = []
    elapsed3s = []
    ns = range(10,410,10)
    for n in ns:
        distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
        stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
        distance_matrix=distance_matrix_np.tolist()
        stream_matrix=stream_matrix_np.tolist()
        sol  = range(n-1,-1,-1)
        sol_np  = np.array(range(n-1,-1,-1))
    
        Is = np.random.randint(0,n-1,4)
        Js = np.random.randint(0,n-1,4)
    
        total1 = 0
        start = time.clock()
        for reps in xrange(N):
            for i in Is:
                for j in Js:
                    total1 += deltaC(i,j, sol)
        elapsed1 = (time.clock() - start)
        start = time.clock()
    
        total2 = 0
        start = time.clock()
        for reps in xrange(N):
            for i in Is:
                for j in Js:
                    total2 += deltaC_np(i,j, sol_np)
        elapsed2 = (time.clock() - start)
    
        total3 = 0
        start = time.clock()
        for reps in xrange(N):
            for i in Is:
                for j in Js:
                    total3 += deltaC2(i,j, sol_np)
        elapsed3 = (time.clock() - start)
    
        print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
        elapsed1s.append(elapsed1)
        elapsed2s.append(elapsed2)
        elapsed3s.append(elapsed3)
    
        #Check errors of one method against another
        #err = 0
        #for i in range(min(n,50)):
        #    for j in range(min(n,50)):
        #        err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
        #print err
    import matplotlib.pyplot as plt
    
    plt.plot(ns, elapsed1s, label='Original',lw=2)
    plt.plot(ns, elapsed3s, label='Optimized',lw=2)
    plt.plot(ns, elapsed2s, label='numpy',lw=2)
    plt.legend(loc='upper left', prop={'size':16})
    plt.xlabel('matrix size')
    plt.ylabel('time')
    plt.show()
    

    And here is the original graph before optimizing out the list comprehension in deltaC_np

    enter image description here

    0 讨论(0)
提交回复
热议问题