UPDATE: This video and content applies to CUDA Toolkit 9. I have since moved to CUDA Toolkit 10 and I didn’t have any of these issues.
So you are combing through the internet trying to find a way to successfully run CUDA dynamic parallelism on your Windows based machine using Visual Studio 2017. You keep getting an error message at compile time that goes something like this…
calling a __global__ function(“myfunction”) from a __global__ function(“myfunction”) is only allowed on the compute_35 architecture or above
Well have no fear, cudaeducation.com is here!!!
You should be able to successfully compile and run your program after making all the changes.
NOTE FOR VISUAL STUDIO 2017 USERS: Make sure that when you update the Code Generation setting that you uncheck “Inherit from parent or project defaults” in the lower left-hand corner of the window that pops up. If not, it will default to a lower compute architecture and you will have the same error occur at compile time, regardless of what you update the string to.
I am running a GeForce GTX 1050 ti graphics card, and therefore use “compute_61,sm_61” as the parameter, even though “compute_35,sm_35” would work just fine.
If you’d like to know the maximum sm architecture you can use for your graphics card, check out http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/