How to speed up Levenshtein distance calculation

前端 未结 4 1354
-上瘾入骨i
-上瘾入骨i 2021-02-02 18:13

I am trying to run a simulation to test the average Levenshtein distance between random binary strings.

My program is in python but I am using this C extension. The fun

4条回答
  •  说谎
    说谎 (楼主)
    2021-02-02 18:36

    You could run this parallel maybe. Generate one giant list of randoms at the start, then in your loop, spawn threads (8 threads) at a time to each process one chunk of the list and add its final result to the sum variable. Or generate a list of 8 at once and do 8 at a time.

    The problem with the openmp suggestion is "This algorithm parallelizes poorly, due to a large number of data dependencies" - Wikipedia

    from threading import Thread
    
    sum = 0
    
    def calc_distance(offset) :
        sum += distance(randoms[offset][0], randoms[offset][1]) #use whatever addressing scheme is best
    
    threads = []
    for i in xrange(8) :
        t = new Thread(target=calc_distance, args=(i))
        t.start()
        threads.append(t)
    

    later....

    for t in threads :
         t.join()
    

    i think this method would port nicely to opencl later as well if levenshtein distance kernel was available (or codable).

    This is just a quick post from memory so there are probably some kinks to work out.

提交回复
热议问题