GPU card resets after 2 seconds

匆匆过客 提交于 2020-01-16 00:50:17

问题


I'm using an NVIDIA geforce card that gives an error after 2 seconds if I try to run some CUDA program on it. I read here that you can use the TDRlevel key in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers. However, I don't see any such key in the registry. Does it needs to be added yourself? Have somebody else experienced this problem. If so, how did you solve it? Thanks.


回答1:


I'm assuming you are using Windows Vista or later.

The article you linked to contains a list of registry keys controlling the Microsoft WDDM Timeout Detection and Recovery mechanism. As talonmies commented, it is not the card giving an error it is the Microsoft Windows WDDM TDR mechanism that detects a long running kernel and kills it to recover the GPU for display purposes.

If you have a kernel that runs for any length of time then the GPU is occupied with the compute work and cannot update your display, naturally you can imagine that most people would consider that bad. Some developers chose to increase the delay to allow developing longer running kernels, with the understanding that their system may become unresponsive for a few seconds. You may also have to disable the TDR if you are using a debugger with a WDDM GPU (NVIDIA Tesla GPUs support TCC which avoids all the WDDM headaches).

If the keys do not exist you should create them. I would suggest:

  • TdrLevel 3 (i.e. enabled)
  • TdrDelay 5 (i.e. 5 seconds)
  • TdrLimitTime 10
  • TdrLimitCount 10 (i.e. max 10 timeouts in 10 seconds)

Alternatives are to use a second GPU for execution or to adjust your problem set to ensure the kernel time is less than 2 seconds - really big problems should be run on a dedicated GPU. That assumes it's not a bug in your kernel, of course!




回答2:


Well you get a timeout if your cuda kernel runs longer than 2 seconds on a graphics card that is connected to a monitor. So to avoid this you need to either split your program into several kernel calls that are each below the 2 seconds limit. The other option is to use a graphics card which is not connected to a monitor. Then there will be no timeout limit.

cudaGetDeviceProperties(&prop,i)
prop.kernelExecTimeoutEnabled

The above code shows whether you have timeout enabled or not.

Finally I read about the registry entry as well but it seems to be discouraged (using GNU/Linux so not an option). I might be wrong but I think that you need to add such a key yourself.



来源:https://stackoverflow.com/questions/9602312/gpu-card-resets-after-2-seconds

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!