发表新帖

发表新帖

How to speed up Levenshtein distance calculation

前端未结

关注

 4  1362

-上瘾入骨i 2021-02-02 18:13

I am trying to run a simulation to test the average Levenshtein distance between random binary strings.

My program is in python but I am using this C extension. The fun

4条回答

说谎 (楼主)

2021-02-02 18:36
You could run this parallel maybe. Generate one giant list of randoms at the start, then in your loop, spawn threads (8 threads) at a time to each process one chunk of the list and add its final result to the sum variable. Or generate a list of 8 at once and do 8 at a time.

The problem with the openmp suggestion is "This algorithm parallelizes poorly, due to a large number of data dependencies" - Wikipedia
```
from threading import Thread

sum = 0

def calc_distance(offset) :
    sum += distance(randoms[offset][0], randoms[offset][1]) #use whatever addressing scheme is best

threads = []
for i in xrange(8) :
    t = new Thread(target=calc_distance, args=(i))
    t.start()
    threads.append(t)
```
later....
```
for t in threads :
     t.join()
```
i think this method would port nicely to opencl later as well if levenshtein distance kernel was available (or codable).

This is just a quick post from memory so there are probably some kinks to work out.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题