When I try to run CUDA code that takes a long time to process on the GPU, I would always get an error such as the following:
Error: C:/kernel.cu:170, code: 4, reason: unspecified launch failure
After spending many sleepless nights trying to figure out what was wrong with my setup, I finally found the reason why!
Windows has a protection mechanism to ensure that your computer doesn’t freeze when the GPU takes a long time to process something. As a result, when I run expensive CUDA code it would timeout because the GPU is taking up too much time.
What I have done is increase the number of seconds in Timeout Detection and Recovery (TDR) in the Windows Registry. You will have to restart your system after making the change to get it working with the new settings.
You can also disable the Timeout Detection and Recovery (TDR) entirely, but it will make your system much more unstable. Please note that if you disable it your system no longer has this protection and it is more prone to freezing. I have observed that I am able to get away with much more expensive processing, but if I abuse it and run an algorithm that takes REALLY long to process, my system will still freeze. Not sure why that happens (maybe its a memory issue), but its a step in the right direction.
You can also increase the timeout time, instead of completely disabling TDR, if you prefer that route.
For a detailed explanation visit this link: https://www.pugetsystems.com/labs/hpc/Working-around-TDR-in-Windows-for-a-better-GPU-computing-experience-777/
For instructions from Microsoft on how to make the edits, visit this link: https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys
More discussion of the topic: https://training.acceleware.com/blog/timeout-detection-windows-display-driver-model-when-running-cuda-kernels-symptoms-solutions-and
Linux users check out this link: https://nvidia.custhelp.com/app/answers/detail/a_id/3029/~/using-cuda-and-x