Multiple subprocesses take a lot of time to complete

纵饮孤独 提交于 2020-01-06 08:16:00

问题


I have a single process that is run using subprocess module's Popen:

result = subprocess.Popen(['tesseract','mypic.png','myop'])
st = time()
while result.poll() is None:
    sleep(0.001)
en = time()

print('Took :'+str(en-st))

Which results in:

Took :0.44703030586242676

Here, a tesseract call is made to process an image mypic.png(attached) and output the OCR's result to myop.txt.

Now I want this to happen on multiple processes on behalf of this comment (or see this directly), so the code is here:

lst = []
for i in range(4):
    lst.append(subprocess.Popen(['tesseract','mypic.png','myop'+str(i)]))

i=0
l = len(lst)
val = 0 
while(val!=(1<<l)-1):
    if(lst[i].poll() is None):
        print('Waiting for :'+str(i))
        sleep(0.01)
    else:
        temp = val
        val = val or (1<<(i))
        if(val!=temp):
            print('Completed for :'+temp)
    i = (i+1) %l

What this code does is make 4 calls to tesseract, save the process objects in a list lst, iterate through all of these objects until all of them are completed. Explanation for the implementation of the infinite loop is given at the bottom.

The problem here is that the latter program is taking a hell lot of time to complete. It is continuously waiting for the processes to complete using poll() function, which is None until the process has not been completed. This should not have happened. It should have taken a little more than 0.44s only. Not something like 10 minutes! Why is this happening?

I came to this specific error by digging into pytesseract, which was taking a lot of time when run parallely using multiprocessing or pathos. So this is a scaled down version of a much bigger issue. My question on that can be found here.


Explanation for the infinite loop: val is 0 initially. It is ORed with the 2^i when the ith process completes. So, if there are 3 processes, then if the first process(i=0) is completed then 2^0 = 1 is OR'ed with val making it 1. With second and third processes being completed, val becomes 2^0 | 2^1 | 2^2 = 7. And 2^3-1 is also 7. So the loop works until val equals 2^{number of processes}-1.


回答1:


Per the faq (with my emphasis):

Tesseract 4 also uses up to four CPU threads while processing a page, so it will be faster than Tesseract 3 for a single page.

If your computer has only two CPU cores, then running four threads will slow down things significantly and it would be better to use a single thread or maybe a maximum of two threads! Using a single thread eliminates the computation overhead of multithreading and is also the best solution for processing lots of images by running one Tesseract process per CPU core.

Set the maximum number of threads using the environment variable OMP_THREAD_LIMIT.

To disable multithreading, use OMP_THREAD_LIMIT=1.

Therefore, if you wish to run multiple tesseract processes concurrently, you may wish to decrease (or experiment with) OMP_THREAD_LIMIT. The optimal value depends on how many threads your machine can support concurrently.

For example, on my machine:

import subprocess
import time
import os 

t = time.perf_counter()    
tasks = [('mypic.png', 'myop{}'.format(i)) for i in range(4)]
procs = [subprocess.Popen(['tesseract', infile, outfile], env={'OMP_THREAD_LIMIT':'1'})
         for infile, outfile in tasks]
for proc in procs:
    proc.wait()
print('{} s'.format(time.perf_counter()-t))

completes in 0.220 seconds, whereas the same code without env={'OMP_THREAD_LIMIT':'1'} typically takes between 3.1 -- 5.1 seconds, with a lot of variation between runs.


To get your code working, use the binary bitwise or operator, | instead of the logical or operator, or:

val = val | (1 << (i))

For example,

import time
import subprocess
lst = []
for i in range(4):
    lst.append(subprocess.Popen(['tesseract', 'mypic.png', 'myop'+str(i)]))

i = 0
l = len(lst)
val = 0
counter = 0
while(val != (1 << l)-1):
    if(lst[i].poll() is None):
        time.sleep(0.001)
    else:
        temp = val
        val = val | (1 << (i))
        if(val != temp):
            print('Completed for : {}'.format(i))
    i = (i+1) % l

    counter += 1
print('{} iterations'.format(counter))

prints output like

Completed for : 1
Completed for : 2
Completed for : 3
Completed for : 0
6121 iterations

Notice the loop still iterates thousands of times, mainly while lst[i].poll() returns None, but also because i = (i+1) % l can revisit the same value multiple times. If one iteration takes 0.001s, then 6121 iterations will take 6.121s. So the while loop is complicated and not very fast.



来源:https://stackoverflow.com/questions/53838992/multiple-subprocesses-take-a-lot-of-time-to-complete

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!