Python Multiprocessing with PyCUDA

前端 未结 2 743
南方客
南方客 2021-01-30 14:46

I\'ve got a problem that I want to split across multiple CUDA devices, but I suspect my current system architecture is holding me back;

What I\'ve set up is a GPU class

2条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-30 15:36

    What you need is a multi-threaded implementation of the map built-in function. Here is one implementation. That, with a little modification to suit your particular needs, you get:

    import threading
    
    def cuda_map(args_list, gpu_instances):
    
        result = [None] * len(args_list)
    
        def task_wrapper(gpu_instance, task_indices):
            for i in task_indices:
                result[i] = gpu_instance.gpufunction(args_list[i])
    
        threads = [threading.Thread(
                        target=task_wrapper, 
                        args=(gpu_i, list(xrange(len(args_list)))[i::len(gpu_instances)])
                  ) for i, gpu_i in enumerate(gpu_instances)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
    
        return result
    

    It is more or less the same as what you have above, with the big difference being that you don't spend time waiting for each single completion of the gpufunction.

提交回复
热议问题