Using 100% of all cores with the multiprocessing module

前端 未结 6 1692
醉酒成梦
醉酒成梦 2020-11-30 19:26

I have two pieces of code that I\'m using to learn about multiprocessing in Python 3.1. My goal is to use 100% of all the available processors. However, the code snippets he

相关标签:
6条回答
  • 2020-11-30 20:04

    Minimum example in pure Python:

    def f(x):
        while 1:
            # ---bonus: gradually use up RAM---
            x += 10000  # linear growth; use exponential for faster ending: x *= 1.01
            y = list(range(int(x))) 
            # ---------------------------------
            pass  # infinite loop, use up CPU
    
    if __name__ == '__main__':  # name guard to avoid recursive fork on Windows
        import multiprocessing as mp
        n = mp.cpu_count() * 32  # multiply guard against counting only active cores
        with mp.Pool(n) as p:
            p.map(f, range(n))
    

    Usage: to warm up on a cold day (but feel free to change the loop to something less pointless.)

    Warning: to exit, don't pull the plug or hold the power button, Ctrl-C instead.

    0 讨论(0)
  • 2020-11-30 20:04

    I'd recommend using the Joblib library, it's a good library for multiprocessing, used in many ML applications, in sklearn etc.

    from joblib import Parallel, delayed
    
    Parallel(n_jobs=-1, prefer="processes", verbose=6)(
        delayed(function_name)(parameter1, parameter2, ...)
        for parameter1, parameter2, ... in object
    )
    

    Where n_jobs is the number of concurrent jobs. Set n=-1 if you want to use all available cores on the machine that you're running your code.

    More details on parameters here: https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

    In your case, a possible implementation would be:

    def worker(i):
        print('worker ', i)
        x = 0
        while x < 1000:
            print(x)
            x += 1
    
    Parallel(n_jobs=-1, prefer="processes", verbose=6)(
            delayed(worker)(num)
            for num in range(50)
        )
    
    0 讨论(0)
  • 2020-11-30 20:13

    Regarding code snippet 1: How many cores / processors do you have on your test machine? It isn't doing you any good to run 50 of these processes if you only have 2 CPU cores. In fact you're forcing the OS to spend more time context switching to move processes on and off the CPU than do actual work.

    Try reducing the number of spawned processes to the number of cores. So "for i in range(50):" should become something like:

    import os;
    # assuming you're on windows:
    for i in range(int(os.environ["NUMBER_OF_PROCESSORS"])):
        ...
    

    Regarding code snippet 2: You're using a multiprocessing.Lock which can only be held by a single process at a time so you're completely limiting all the parallelism in this version of the program. You've serialized things so that process 1 through 50 start, a random process (say process 7) acquires the lock. Processes 1-6, and 8-50 all sit on the line:

    l.acquire()
    

    While they sit there they are just waiting for the lock to be released. Depending on the implementation of the Lock primitive they are probably not using any CPU, they're just sitting there using system resources like RAM but are doing no useful work with the CPU. Process 7 counts and prints to 1000 and then releases the lock. The OS then is free to schedule randomly one of the remaining 49 processes to run. Whichever one it wakes up first will acquire the lock next and run while the remaining 48 wait on the Lock. This'll continue for the whole program.

    Basically, code snippet 2 is an example of what makes concurrency hard. You have to manage access by lots of processes or threads to some shared resource. In this particular case there really is no reason that these processes need to wait on each other though.

    So of these two, Snippet 1 is closer to more efficiently utilitizing the CPU. I think properly tuning the number of processes to match the number of cores will yield a much improved result.

    0 讨论(0)
  • 2020-11-30 20:18

    You can use psutil to pin each process spawned by multiprocessing to a specific CPU:

    import multiprocessing as mp
    import psutil
    
    
    def spawn():
        procs = list()
        n_cpus = psutil.cpu_count()
        for cpu in range(n_cpus):
            affinity = [cpu]
            d = dict(affinity=affinity)
            p = mp.Process(target=run_child, kwargs=d)
            p.start()
            procs.append(p)
        for p in procs:
            p.join()
            print('joined')
    
    def run_child(affinity):
        proc = psutil.Process()  # get self pid
        print('PID: {pid}'.format(pid=proc.pid))
        aff = proc.cpu_affinity()
        print('Affinity before: {aff}'.format(aff=aff))
        proc.cpu_affinity(affinity)
        aff = proc.cpu_affinity()
        print('Affinity after: {aff}'.format(aff=aff))
    
    
    if __name__ == '__main__':
        spawn()
    

    Note: As commented, psutil.Process.cpu_affinity is not available on macOS.

    0 讨论(0)
  • 2020-11-30 20:25

    To use 100% of all cores, do not create and destroy new processes.

    Create a few processes per core and link them with a pipeline.

    At the OS-level, all pipelined processes run concurrently.

    The less you write (and the more you delegate to the OS) the more likely you are to use as many resources as possible.

    python p1.py | python p2.py | python p3.py | python p4.py ...
    

    Will make maximal use of your CPU.

    0 讨论(0)
  • 2020-11-30 20:26

    To answer your question(s):

    Is there anyway to 'force' python to use all 100%?

    Not that I've heard of

    Is the OS (windows 7, 64bit) limiting Python's access to the processors?

    Yes and No, Yes: if it python took 100%, windows will freeze. No, you can grant python Admin Priviledges which will result in a lockup.

    How do these processes relate to processors?

    They don't, technically on the OS level those python "processes" are threads which is processed by the OS Handler as it needs handling.

    Instead, what are the processes using? Are they sharing all cores? And if so, is it the OS that is forcing the processes to share the cores?

    They are sharing all cores, unless you start a single python instance that has affinity set to a certain core (in a multicore system) your processes will be split into which-ever-core-is-free processing. So yes, the OS is forcing the core sharing by default (or python is technically)

    if you are interested in python core affinity, check out the affinity package for python.

    0 讨论(0)
提交回复
热议问题