Any particular function to initialize GPU other than the first cudaMalloc call?

前端 未结 1 1216
無奈伤痛
無奈伤痛 2020-11-30 11:29

The first cudaMalloc call is slow (like 0.2 sec) because of some initialization work on GPU. Is there any function that solely do initialization, so that I can separate the

相关标签:
1条回答
  • 2020-11-30 12:06

    A call to

    cudaFree(0);
    

    is the canonical way to force lazy context establishment in the CUDA runtime. You can't reduce the overhead, that is a function of driver, runtime and operating system latencies. But the call above will let you control how/when those overheads occur during program execution.

    EDIT in 2015 to add that the heuristics of context initialisation in the runtime API have subtly changed over time so that cudaSetDevice now establishes a context, so the cudaFree() call isn't explicitly required to intialise a context, you can use cudaSetDeviceinstead. Also note that some set-up time will still be incurred at the first kernel launch, whereas before this wasn't the case. For for kernel timing, it is best to include a warm-up call first before launching the kernel you will time to remove this set-up latency. It appears that the various profiling tools have enough granularity built in to avoid this without any extra API calls or kernel calls.

    0 讨论(0)
提交回复
热议问题