What is some example code for demonstrating multicore speedup in Python on Windows?

前端 未结 1 577
迷失自我
迷失自我 2020-12-01 20:23

I\'m using Python 3 on Windows and trying to construct a toy example, demonstrating how using multiple CPU cores can speed up computation. The toy example is rendering of th

相关标签:
1条回答
  • 2020-12-01 21:01

    It's very simple to demonstrate a multiprocessing speed up:

    import multiprocessing
    import sys
    import time
    
    # multi-platform precision clock
    get_timer = time.clock if sys.platform == "win32" else time.time
    
    def cube_function(num):
        time.sleep(0.01)  # let's simulate it takes ~10ms for the CPU core to cube the number
        return num**3
    
    if __name__ == "__main__":  # multiprocessing guard
        # we'll test multiprocessing with pools from one to the number of CPU cores on the system
        # it won't show significant improvements after that and it will soon start going
        # downhill due to the underlying OS thread context switches
        for workers in range(1, multiprocessing.cpu_count() + 1):
            pool = multiprocessing.Pool(processes=workers)
            # lets 'warm up' our pool so it doesn't affect our measurements
            pool.map(cube_function, range(multiprocessing.cpu_count()))
            # now to the business, we'll have 10000 numbers to quart via our expensive function
            print("Cubing 10000 numbers over {} processes:".format(workers))
            timer = get_timer()  # time measuring starts now
            results = pool.map(cube_function, range(10000))  # map our range to the cube_function
            timer = get_timer() - timer  # get our delta time as soon as it finishes
            print("\tTotal: {:.2f} seconds".format(timer))
            print("\tAvg. per process: {:.2f} seconds".format(timer / workers))
            pool.close()  # lets clear out our pool for the next run
            time.sleep(1)  # waiting for a second to make sure everything is cleaned up
    

    Of course, we're just simulating here 10ms-per-number calculations, you can replace cube_function with anything CPU taxing for a real-world demonstration. The results are as expected:

    Cubing 10000 numbers over 1 processes:
            Total: 100.01 seconds
            Avg. per process: 100.01 seconds
    Cubing 10000 numbers over 2 processes:
            Total: 50.02 seconds
            Avg. per process: 25.01 seconds
    Cubing 10000 numbers over 3 processes:
            Total: 33.36 seconds
            Avg. per process: 11.12 seconds
    Cubing 10000 numbers over 4 processes:
            Total: 25.00 seconds
            Avg. per process: 6.25 seconds
    Cubing 10000 numbers over 5 processes:
            Total: 20.00 seconds
            Avg. per process: 4.00 seconds
    Cubing 10000 numbers over 6 processes:
            Total: 16.68 seconds
            Avg. per process: 2.78 seconds
    Cubing 10000 numbers over 7 processes:
            Total: 14.32 seconds
            Avg. per process: 2.05 seconds
    Cubing 10000 numbers over 8 processes:
            Total: 12.52 seconds
            Avg. per process: 1.57 seconds
    

    Now, why not 100% linear? Well, first of all, it takes some time to map/distribute the data to the sub-processes and to get it back, there is some cost to context switching, there are other tasks that use my CPUs from time to time, time.sleep() is not exactly precise (nor it could be on a non-RT OS)... But the results are roughly in the ballpark expected for parallel processing.

    0 讨论(0)
提交回复
热议问题