clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU)

后端 未结 2 1000
半阙折子戏
半阙折子戏 2020-12-21 15:31

On Nvidia GPUs, when I call clEnqueueNDRange, the program waits for it to finish before continuing. More precisely, I\'m calling its equivalent C++ binding,

相关标签:
2条回答
  • 2020-12-21 15:57

    Yes, you're right. AFAIK - the nvidia implementation has a synchronous "clEnqueueNDRange". I have noticed this when using my library (Brahma) as well. I don't know if there is a workaround or a way of preventing this, save using a different implementation (and hence device).

    0 讨论(0)
  • 2020-12-21 16:03

    I emailed the Nvidia guys and actually got a pretty fair response. There's a sample in the Nvidia SDK that shows, for each device you need to create seperate:

    • queues - So you can represent each device and enqueue orders to it
    • buffers - One buffer for each array you need to pass to the device, otherwise the devices will pass around a single buffer, waiting for it to become available and effectively serializing everything.
    • kernel - I think this one's optional, but it makes specifying arguments a lot easier.

    Furthermore, you have to call EnqueueNDRangeKernel for each queue in separate threads. That's not in the SDK sample, but the Nvidia guy confirmed that the calls are blocking.

    After doing all this, I achieved concurrency on multiple GPUs. However, there's still a bit of a problem. On to the next question...

    0 讨论(0)
提交回复
热议问题