Why Does Looping Beat Indexing Here?

后端未结

关注

 2  1043

A few years ago, someone posted on Active State Recipes for comparison purposes, three python/NumPy functions; each of these accepted the same arguments and returne

相关标签:

2条回答

攒了一身酷

2021-01-04 14:49
TL; DR The second code above is only looping over the number of dimensions of the points (3 times through the for loop for 3D points) so the looping isn't much there. The real speed-up in the second code above is that it better harnesses the power of Numpy to avoid creating some extra matrices when finding the differences between points. This reduces memory used and computational effort.

Longer Explanation I think that the calcDistanceMatrixFastEuclidean2 function is deceiving you with its loop perhaps. It is only looping over the number of dimensions of the points. For 1D points, the loop only executes once, for 2D, twice, and for 3D, thrice. This is really not much looping at all.

Let's analyze the code a little bit to see why the one is faster than the other. calcDistanceMatrixFastEuclidean I will call fast1 and calcDistanceMatrixFastEuclidean2 will be fast2.

fast1 is based on the Matlab way of doing things as is evidenced by the repmap function. The repmap function creates an array in this case that is just the original data repeated over and over again. However, if you look at the code for the function, it is very inefficient. It uses many Numpy functions (3 reshapes and 2 repeats) to do this. The repeat function is also used to create an array that contains the the original data with each data item repeated many times. If our input data is [1,2,3] then we are subtracting [1,2,3,1,2,3,1,2,3] from [1,1,1,2,2,2,3,3,3]. Numpy has had to create a lot of extra matrices in between running Numpy's C code which could have been avoided.

fast2 uses more of Numpy's heavy lifting without creating as many matrices between Numpy calls. fast2 loops through each dimension of the points, does the subtraction and keeps a running total of the squared differences between each dimension. Only at the end is the square root done. So far, this may not sound quite as efficient as fast1, but fast2 avoids doing the repmat stuff by using Numpy's indexing. Let's look at the 1D case for simplicity. fast2 makes a 1D array of the data and subtracts it from a 2D (N x 1) array of the data. This creates the difference matrix between each point and all of the other points without having to use repmat and repeat and thereby bypasses creating a lot of extra arrays. This is where the real speed difference lies in my opinion. fast1 creates a lot of extra in between matrices (and they are created expensively computationally) to find the differences between points while fast2 better harnesses the power of Numpy to avoid these.

By the way, here is a little bit faster version of fast2:
```
def calcDistanceMatrixFastEuclidean3(nDimPoints):
  nDimPoints = array(nDimPoints)
  n,m = nDimPoints.shape
  data = nDimPoints[:,0]
  delta = (data - data[:,newaxis])**2
  for d in xrange(1,m):
    data = nDimPoints[:,d]
    delta += (data - data[:,newaxis])**2
  return sqrt(delta)
```
The difference is that we are no longer creating delta as a zeros matrix.
0 讨论(0)
发布评论:

提交评论
- 加载中...

盖世英雄少女心

2021-01-04 14:55

dis for fun:

dis.dis(calcDistanceMatrixFastEuclidean)

  2           0 LOAD_GLOBAL              0 (len)
              3 LOAD_FAST                0 (points)
              6 CALL_FUNCTION            1
              9 STORE_FAST               1 (numPoints)

  3          12 LOAD_GLOBAL              1 (sqrt)
             15 LOAD_GLOBAL              2 (sum)
             18 LOAD_GLOBAL              3 (repmat)
             21 LOAD_FAST                0 (points)
             24 LOAD_FAST                1 (numPoints)
             27 LOAD_CONST               1 (1)
             30 CALL_FUNCTION            3

  4          33 LOAD_GLOBAL              4 (repeat)
             36 LOAD_FAST                0 (points)
             39 LOAD_FAST                1 (numPoints)
             42 LOAD_CONST               2 ('axis')
             45 LOAD_CONST               3 (0)
             48 CALL_FUNCTION          258
             51 BINARY_SUBTRACT
             52 LOAD_CONST               4 (2)
             55 BINARY_POWER
             56 LOAD_CONST               2 ('axis')
             59 LOAD_CONST               1 (1)
             62 CALL_FUNCTION          257
             65 CALL_FUNCTION            1
             68 STORE_FAST               2 (distMat)

  5          71 LOAD_FAST                2 (distMat)
             74 LOAD_ATTR                5 (reshape)
             77 LOAD_FAST                1 (numPoints)
             80 LOAD_FAST                1 (numPoints)
             83 BUILD_TUPLE              2
             86 CALL_FUNCTION            1
             89 RETURN_VALUE

dis.dis(calcDistanceMatrixFastEuclidean2)

  2           0 LOAD_GLOBAL              0 (array)
              3 LOAD_FAST                0 (nDimPoints)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (nDimPoints)

  3          12 LOAD_FAST                0 (nDimPoints)
             15 LOAD_ATTR                1 (shape)
             18 UNPACK_SEQUENCE          2
             21 STORE_FAST               1 (n)
             24 STORE_FAST               2 (m)

  4          27 LOAD_GLOBAL              2 (zeros)
             30 LOAD_FAST                1 (n)
             33 LOAD_FAST                1 (n)
             36 BUILD_TUPLE              2
             39 LOAD_CONST               1 ('d')
             42 CALL_FUNCTION            2
             45 STORE_FAST               3 (delta)

  5          48 SETUP_LOOP              76 (to 127)
             51 LOAD_GLOBAL              3 (xrange)
             54 LOAD_FAST                2 (m)
             57 CALL_FUNCTION            1
             60 GET_ITER
        >>   61 FOR_ITER                62 (to 126)
             64 STORE_FAST               4 (d)

  6          67 LOAD_FAST                0 (nDimPoints)
             70 LOAD_CONST               0 (None)
             73 LOAD_CONST               0 (None)
             76 BUILD_SLICE              2
             79 LOAD_FAST                4 (d)
             82 BUILD_TUPLE              2
             85 BINARY_SUBSCR
             86 STORE_FAST               5 (data)

  7          89 LOAD_FAST                3 (delta)
             92 LOAD_FAST                5 (data)
             95 LOAD_FAST                5 (data)
             98 LOAD_CONST               0 (None)
            101 LOAD_CONST               0 (None)
            104 BUILD_SLICE              2
            107 LOAD_GLOBAL              4 (newaxis)
            110 BUILD_TUPLE              2
            113 BINARY_SUBSCR
            114 BINARY_SUBTRACT
            115 LOAD_CONST               2 (2)
            118 BINARY_POWER
            119 INPLACE_ADD
            120 STORE_FAST               3 (delta)
            123 JUMP_ABSOLUTE           61
        >>  126 POP_BLOCK

  8     >>  127 LOAD_GLOBAL              5 (sqrt)
            130 LOAD_FAST                3 (delta)
            133 CALL_FUNCTION            1
            136 RETURN_VALUE

I'm not an expert on dis, but it seems like you would have to look more at the functions that the first is calling to know why they take a while. There is a performance profiler tool with Python as well, cProfile.

0 讨论(0)