How to optimize math operations on matrix in python

后端未结

关注

 1  378

I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I\'ve heard of numpy, but I really do not know how ap

相关标签:

1条回答

天涯浪人

2021-01-19 06:18

In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s), or three times (if r!=s) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).

import numpy as np

distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4

def deltaC_np(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]

    K = np.array([i for i in xrange(n) if i!=r and i!=s])

    return np.sum(
        (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
        *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
        (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
        * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))

In this numpy implementation, rather than a for loop over the elements in K, the operations are applied across all the elements in K within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right. enter image description here

This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]), so you were doing unnecessary computation, and your original code can be optimized without using numpy.

It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension

K = np.array([i for i in xrange(n) if i!=r and i!=s])

Once this is replaced with vectorizing code

if r==s:
    K=np.arange(n-1)
    K[r:] += 1
else:
    K=np.arange(n-2)
    if r<s:
        K[r:] += 1
        K[s-1:] += 1
    else:
        K[s:] += 1
        K[r-1:] += 1

the numpy function is much faster.

A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.

enter image description here

The full code is below for reference, partly in case someone else can take it further. (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.)

def deltaC(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]
    for k in xrange(n):
        if k != r and k != s:
            delta += \
                stream_matrix[r][k] \
                * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
                stream_matrix[s][k] \
                * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
                stream_matrix[k][r] \
                * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
                stream_matrix[k][s] \
                * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
    return delta

import numpy as np

def deltaC_np(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]

    if r==s:
        K=np.arange(n-1)
        K[r:] += 1
    else:
        K=np.arange(n-2)
        if r<s:
            K[r:] += 1
            K[s-1:] += 1
        else:
            K[s:] += 1
            K[r-1:] += 1
    #K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW

    return np.sum(
        (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
        *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
        (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
        * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))

def deltaC2(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]
    for k in xrange(n):
        if k != r and k != s:
            sol_k = sol[k]
            delta += \
                (stream_matrix[r][k] - stream_matrix[s][k]) \
                * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
                + \
                (stream_matrix[k][r] - stream_matrix[k][s]) \
                * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
    return delta


import time

N=200

elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
    distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
    stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
    distance_matrix=distance_matrix_np.tolist()
    stream_matrix=stream_matrix_np.tolist()
    sol  = range(n-1,-1,-1)
    sol_np  = np.array(range(n-1,-1,-1))

    Is = np.random.randint(0,n-1,4)
    Js = np.random.randint(0,n-1,4)

    total1 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total1 += deltaC(i,j, sol)
    elapsed1 = (time.clock() - start)
    start = time.clock()

    total2 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total2 += deltaC_np(i,j, sol_np)
    elapsed2 = (time.clock() - start)

    total3 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total3 += deltaC2(i,j, sol_np)
    elapsed3 = (time.clock() - start)

    print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
    elapsed1s.append(elapsed1)
    elapsed2s.append(elapsed2)
    elapsed3s.append(elapsed3)

    #Check errors of one method against another
    #err = 0
    #for i in range(min(n,50)):
    #    for j in range(min(n,50)):
    #        err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
    #print err
import matplotlib.pyplot as plt

plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()

And here is the original graph before optimizing out the list comprehension in deltaC_np

enter image description here

0 讨论(0)