efficient algorithm instead of looping

后端 未结 3 2169
深忆病人
深忆病人 2021-02-15 13:21

I have a data set where each samples has a structure similar to this

X=[ [[],[],[],[]], [[],[]] , [[],[],[]] ,[[][]]]

for example:



        
相关标签:
3条回答
  • 2021-02-15 13:35

    Instead of summing the dot product of each pair, which requires n * m operations, you can sum all of the vectors in each list and just do the final dot product, which will only take n + m operations.

    Before:

    def calc_slow(L1, L2):
        result = 0
        for n, m in itertools.product(L1, L2):
            result += np.dot(n, m)
        return result
    

    After:

    def calc_fast(L1, L2):
        L1_sums = np.zeros(len(L1[0]))
        L2_sums = np.zeros(len(L2[0]))
        for vec in L1:
            L1_sums += vec
        for vec in L2:
            L2_sums += vec
        return np.dot(L1_sums, L2_sums)
    

    EDIT: Even faster version, taking advantage of numpy:

    def calc_superfast(L1, L2):
        return np.dot(np.array(L1).sum(0),
                      np.array(L2).sum(0))
    

    The output is the same:

    print X[0], Y[0], calc_slow(X[0], Y[0])
    print X[0], Y[0], calc_fast(X[0], Y[0])
    

    prints:

    [[1, 2, 3], [2, 4, 5], [2, 3, 4]] [[12, 14, 15], [12, 13, 14]] 711
    [[1, 2, 3], [2, 4, 5], [2, 3, 4]] [[12, 14, 15], [12, 13, 14]] 711.0
    

    Timing it shows that there is significant improvement. Timing code:

    import random
    import time
    def rand_vector(size=3):
        return [random.randint(1, 100) for _ in xrange(3)]
    def rand_list(length=200):
        return [rand_vector() for _ in xrange(length)]
    
    print "Generating lists..."
    L1 = rand_list(200)
    L2 = rand_list(200)
    
    print "Running slow..."
    s = time.time()
    print calc_slow(L1, L2)
    print "Slow for (%d, %d) took %.2fs" % (len(L1), len(L2), time.time() - s)
    
    print "Running fast..."
    s = time.time()
    print calc_fast(L1, L2)
    print "Fast for (%d, %d) took %.2fs" % (len(L1), len(L2), time.time() - s)
    

    Sample outputs:

    Generating lists...
    Running slow...
    75715569
    Slow for (100, 100) took 1.48s
    Running fast...
    75715569.0
    Fast for (100, 100) took 0.03s
    
    Generating lists...
    Running slow...
    309169971
    Slow for (200, 200) took 5.29s
    Running fast...
    309169971.0
    Fast for (200, 200) took 0.04s
    
    Running fast...
    3.05185703539e+12
    Fast for (20000, 20000) took 1.94s
    

    The operation for two lists of 20000 elements each only took ~2 seconds, whereas I predict it would take several hours with the old method.


    The reason this works is that you can group the operations together. Assuming you have the following lists:

    L1 = [a, b, c], [d, e, f], [g, h, i] 
    L2 = [u, v, w], [x, y, z]
    

    Then summing the dot product of each pair would look like this:

    a*u + b*v + c*w + a*x + b*y + c*z +
    d*u + e*v + f*w + d*x + e*y + f*z +
    g*u + h*v + i*w + g*x + h*y + i*z
    

    You can factor out the u, v, w, x, y, and z elements:

    u*(a + d + g) + v*(b + e + h) + w*(c + f + i) +
    x*(a + d + g) + y*(b + e + h) + z*(c + f + i)
    

    Then you can further factor out those sums:

    (u + x)*(a + d + g) + (v + y)*(b + e + h) + (w + z)*(c + f + i)
    

    Which is just the dot product of the summed vectors from each initial list.

    0 讨论(0)
  • 2021-02-15 13:44

    There's nothing you can do here. You want to get the results of all multiplications, you just have to do them, and that's what your algorithm does. One of the only things you can do is store your results in a hashtable, in case you know that you have a lot of duplicate results, but it's gonna cost a lot of memory if you don't. By the way, multithreading might make your program run faster, but it will never improve it's complexity, which is the number of operations needed.

    0 讨论(0)
  • 2021-02-15 13:56

    You can also bypass the need for itertools.product by directly doing the dot product on inner matrices:

    def calc_matrix(l1, l2):
        return np.array(l1).dot(np.array(l2).T).sum()
    
    def kernel(x1, x2):
        return sum(
           calc_matrix(l1, l2)
           for l1, l2 in zip(x1, x2)
        )
    

    Edit:

    On short lists (less than a few thousand elements) this will be faster than Claudiu's (awesome) answer. His will scale better above these numbers:

    Using Claudiu's benchmarks:

    # len(l1) == 500
    
    In [9]: %timeit calc_matrix(l1, l2)
    10 loops, best of 3: 8.11 ms per loop
    
    In [10]: %timeit calc_fast(l1, l2)
    10 loops, best of 3: 14.2 ms per loop
    
    # len(l2) == 5000
    
    In [19]: %timeit calc_matrix(l1, l2)
    10 loops, best of 3: 61.2 ms per loop
    
    In [20]: %timeit calc_fast(l1, l2)
    10 loops, best of 3: 56.7 ms per loop
    
    0 讨论(0)
提交回复
热议问题