How can we use tqdm in a parallel execution with joblib?

后端 未结 6 674
轮回少年
轮回少年 2021-02-05 01:25

I want to run a function in parallel, and wait until all parallel nodes are done, using joblib. Like in the example:

from math import sqrt
from joblib import Par         


        
相关标签:
6条回答
  • 2021-02-05 01:52

    Just put range(10) inside tqdm(...)! It probably seemed too good to be true for you, but it really works (on my machine):

    from math import sqrt
    from joblib import Parallel, delayed  
    from tqdm import tqdm  
    result = Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in tqdm(range(100000)))
    
    0 讨论(0)
  • 2021-02-05 01:54

    If your problem consists of many parts, you could split the parts into k subgroups, run each subgroup in parallel and update the progressbar in between, resulting in k updates of the progress.

    This is demonstrated in the following example from the documentation.

    >>> with Parallel(n_jobs=2) as parallel:
    ...    accumulator = 0.
    ...    n_iter = 0
    ...    while accumulator < 1000:
    ...        results = parallel(delayed(sqrt)(accumulator + i ** 2)
    ...                           for i in range(5))
    ...        accumulator += sum(results)  # synchronization barrier
    ...        n_iter += 1
    

    https://pythonhosted.org/joblib/parallel.html#reusing-a-pool-of-workers

    0 讨论(0)
  • 2021-02-05 02:00

    Modifying nth's great answer to permit a dynamic flag to use TQDM or not and to specify the total ahead of time so that the status bar fills in correctly.

    from tqdm.auto import tqdm
    from joblib import Parallel
    
    class ProgressParallel(Parallel):
        def __init__(self, use_tqdm=True, total=None, *args, **kwargs):
            self._use_tqdm = use_tqdm
            self._total = total
            super().__init__(*args, **kwargs)
    
        def __call__(self, *args, **kwargs):
            with tqdm(disable=not self._use_tqdm, total=self._total) as self._pbar:
                return Parallel.__call__(self, *args, **kwargs)
    
        def print_progress(self):
            if self._total is None:
                self._pbar.total = self.n_dispatched_tasks
            self._pbar.n = self.n_completed_tasks
            self._pbar.refresh()
    
    0 讨论(0)
  • 2021-02-05 02:01

    As noted above, solutions that simply wrap the iterable passed to joblib.Parallel() do not truly monitor the progress of execution. Instead, I suggest subclassing Parallel and overriding the print_progress() method, as follows:

    import joblib
    from tqdm.auto import tqdm
    
    class ProgressParallel(joblib.Parallel):
        def __call__(self, *args, **kwargs):
            with tqdm() as self._pbar:
                return joblib.Parallel.__call__(self, *args, **kwargs)
    
        def print_progress(self):
            self._pbar.total = self.n_dispatched_tasks
            self._pbar.n = self.n_completed_tasks
            self._pbar.refresh()
    
    0 讨论(0)
  • 2021-02-05 02:06

    I've created pqdm a parallel tqdm wrapper with concurrent futures to comfortably get this done, give it a try!

    To install

    pip install pqdm
    

    and use

    from pqdm.processes import pqdm
    # If you want threads instead:
    # from pqdm.threads import pqdm
    
    args = [1, 2, 3, 4, 5]
    # args = range(1,6) would also work
    
    def square(a):
        return a*a
    
    result = pqdm(args, square, n_jobs=2)
    
    0 讨论(0)
  • 2021-02-05 02:08

    Here's possible workaround

    def func(x):
        time.sleep(random.randint(1, 10))
        return x
    
    def text_progessbar(seq, total=None):
        step = 1
        tick = time.time()
        while True:
            time_diff = time.time()-tick
            avg_speed = time_diff/step
            total_str = 'of %n' % total if total else ''
            print('step', step, '%.2f' % time_diff, 
                  'avg: %.2f iter/sec' % avg_speed, total_str)
            step += 1
            yield next(seq)
    
    all_bar_funcs = {
        'tqdm': lambda args: lambda x: tqdm(x, **args),
        'txt': lambda args: lambda x: text_progessbar(x, **args),
        'False': lambda args: iter,
        'None': lambda args: iter,
    }
    
    def ParallelExecutor(use_bar='tqdm', **joblib_args):
        def aprun(bar=use_bar, **tq_args):
            def tmp(op_iter):
                if str(bar) in all_bar_funcs.keys():
                    bar_func = all_bar_funcs[str(bar)](tq_args)
                else:
                    raise ValueError("Value %s not supported as bar type"%bar)
                return Parallel(**joblib_args)(bar_func(op_iter))
            return tmp
        return aprun
    
    aprun = ParallelExecutor(n_jobs=5)
    
    a1 = aprun(total=25)(delayed(func)(i ** 2 + j) for i in range(5) for j in range(5))
    a2 = aprun(total=16)(delayed(func)(i ** 2 + j) for i in range(4) for j in range(4))
    a2 = aprun(bar='txt')(delayed(func)(i ** 2 + j) for i in range(4) for j in range(4))
    a2 = aprun(bar=None)(delayed(func)(i ** 2 + j) for i in range(4) for j in range(4))
    
    0 讨论(0)
提交回复
热议问题