Multiple CUDA contexts for one device - any sense?

故事扮演 提交于 2019-12-22 04:59:01

问题


I thought I had the grasp of this but apparently I do not:) I need to perform parallel H.264 stream encoding with NVENC from frames that are not in any of the formats accepted by the encoder so I have a following code pipeline:

  • A callback informing that a new frame has arrived is called
  • I copy the frame to CUDA memory and perform the needed color space conversions (only the first cuMemcpy is synchronous, so I can return from the callback, all pending operations are pushed in a dedicated stream)
  • I push an event onto the stream and have another thread waiting for it, as soon as it is set I take the CUDA memory pointer with the frame in the correct color space and feed it to the decoder

For some reason I had the assumption that I need a dedicated context for each thread if I perform this pipeline in parallel threads. The code was slow and after some reading I understood that the context switching is actually expensive, and then I actually came to the conclusion that it makes no sense since in a context owns the whole GPU so I lock out any parallel processing from other transcoder threads.

Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?

Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?


回答1:


Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?

You should be fine with a single context.

Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?

The CUDA device context is discussed in the programming guide. It represents all of the state (memory map, allocations, kernel definitions, and other state-related information) associated with a particular process (i.e. associated with that particular process' use of a GPU). Separate processes will normally have separate contexts (as will separate devices), as these processes have independent GPU usage and independent memory maps.

If you have multi-process usage of a GPU, you will normally create multiple contexts on that GPU. As you've discovered, it's possible to create multiple contexts from a single process, but not usually necessary.

And yes, when you have multiple contexts, kernels launched in those contexts will require context switching to go from one kernel in one context to another kernel in another context. Those kernels cannot run concurrently.

CUDA runtime API usage manages contexts for you. You normally don't explicitly interact with a CUDA context when using the runtime API. However, in driver API usage, the context is explicitly created and managed.



来源:https://stackoverflow.com/questions/29964392/multiple-cuda-contexts-for-one-device-any-sense

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!