Intensive Shaders (>1 second per primitive)

bloodtoes · April 5, 2010, 6:16pm

Okay, so I built a nice little app to render a buffer-sized quad with an arbitrary shader on it to an offscreen buffer and save it out as a file. All in the name of speeding up some computations for precomputed textures. And I mean a huge speed-up. My tests so far indicate that the entire texture could be generated in a little over a second using GLSL, whereas it takes nearly 14 hours with my CPU version. Maybe that just means my CPU code sucks.

Anyway, it seems that shaders which take more than about a second cause windows to freak out. It says the display driver has stopped responding and I guess it aborts the process… causes the app to hang or crash and never finish what it was doing.

Is there a way to tell windows/GL/the driver “hey, it’s okay. I know this is gonna take a while but chill and all will be cool in a couple seconds”? Or, as is more likely, am I going about this problem all wrong? This sort of procedure seems a more likely candidate for compute shaders… but that’s a couple months away I reckon at least, and not very portable for a year or more until SM5-capable cards are prevalent. Not to mention that I haven’t yet figured out how compute shaders work. Maybe I should look at CUDA instead?

overlay · April 5, 2010, 8:29pm

There are at least 3 different solutions:

Try to tile your quad into smaller quads. Here is why:

Vista and above have a new driver model, the Windows Display Driver Model (WDDM). “When a command buffer spends too long in the graphics chip (more than two seconds), the operating system assumes the chip is hung, kills all the graphics contexts, resets the graphics chip and recovers the graphics driver, in order to keep the operating system responsive.” The 2 second timeout is called the “Timeout Detection and Recovery” (TDR).

“If the application performs extremely GPU intensive and lengthy operations, for example rendering hundreds of fullscreen quads using a complex pixel shader all in a single glDrawElements call, in order to avoid exceeding the 2 second timeout and having an application being killed by Windows Vista’s Timeout Detection and Recovery, split the call into chunks and call glFlush/glFinish between them.”

ref: http://www.opengl.org/pipeline/article/vol003_7/

there are some registry keys to change in order to prevent TDR when running benchmark on slow GPUs:

ref: http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx

when TDR happens, there is a device remove event. You can try to listen to this event.

ref: http://msdn.microsoft.com/en-us/library/bb173571.aspx

Using CUDA, you will face the same issue:

“Vista, Server 2008 and Windows 7 Specific Issues:
Individual kernels are limited to a 2-second runtime by Windows
Vista. Kernels that run for longer than 2 seconds will trigger
the Timeout Detection and Recovery (TDR) mechanism.”

and it is not better on XP:

"Individual GPU program launches are limited to a run time
of less than 5 seconds on a GPU with a display attached.
Exceeding this time limit usually causes a launch failure
reported through the CUDA driver or the CUDA runtime. GPUs
without a display attached are not subject to the 5 second
runtime restriction. For this reason it is recommended that
CUDA be run on a GPU that is NOT attached to a display and
does not have the Windows desktop extended onto it. In this
case, the system must contain at least one NVIDIA GPU that
serves as the primary graphics adapter.
"

ref: http://developer.download.nvidia.com/com…tes_windows.txt

bloodtoes · April 5, 2010, 8:41pm

Outstanding response, thanks!