Python Multiprocessing with PyCUDA

前端 未结 2 738
南方客
南方客 2021-01-30 14:46

I\'ve got a problem that I want to split across multiple CUDA devices, but I suspect my current system architecture is holding me back;

What I\'ve set up is a GPU class

2条回答
  •  北恋
    北恋 (楼主)
    2021-01-30 15:43

    You need to get all your bananas lined up on the CUDA side of things first, then think about the best way to get this done in Python [shameless rep whoring, I know].

    The CUDA multi-GPU model is pretty straightforward pre 4.0 - each GPU has its own context, and each context must be established by a different host thread. So the idea in pseudocode is:

    1. Application starts, process uses the API to determine the number of usable GPUS (beware things like compute mode in Linux)
    2. Application launches a new host thread per GPU, passing a GPU id. Each thread implicitly/explicitly calls equivalent of cuCtxCreate() passing the GPU id it has been assigned
    3. Profit!

    In Python, this might look something like this:

    import threading
    from pycuda import driver
    
    class gpuThread(threading.Thread):
        def __init__(self, gpuid):
            threading.Thread.__init__(self)
            self.ctx  = driver.Device(gpuid).make_context()
            self.device = self.ctx.get_device()
    
        def run(self):
            print "%s has device %s, api version %s"  \
                 % (self.getName(), self.device.name(), self.ctx.get_api_version())
            # Profit!
    
        def join(self):
            self.ctx.detach()
            threading.Thread.join(self)
    
    driver.init()
    ngpus = driver.Device.count()
    for i in range(ngpus):
        t = gpuThread(i)
        t.start()
        t.join()
    

    This assumes it is safe to just establish a context without any checking of the device beforehand. Ideally you would check the compute mode to make sure it is safe to try, then use an exception handler in case a device is busy. But hopefully this gives the basic idea.

提交回复
热议问题