Scheduling kernel to prevent driver from crashing (watchdog)

My global work size is {128; 128; 4096}. The calculations are pretty intense, so every time I start my kernel, windows watchdog fires in and makes the GPU driver crash (the screen is going black).

Right now I’m enqueueing the kernel in the following way (using OpenCL.Net wrapper for C#):

IntPtr[] workGroupSizePtr = new IntPtr[] { (IntPtr)128, (IntPtr)128, (IntPtr)4096};
error = Cl.EnqueueNDRangeKernel(cmdQueue, kernel, 3, null, workGroupSizePtr, null, 0, null, out clevent); 

As you can see, I’m letting OpenCL to decide the size of my local work size. Is it optimal in my case or should I do it on my own? What would be the optimal local work size? I don’t really understand the concept of workgroups, why would I want to have a few of them?

How can I divide my large task into a few smaller subtask and prevent the GPU driver from crashing? As far as I understand Windows’s watchdog kicks out the kernel, because it’s taking too long to execute. What steps can be taken to prevent this behavior?

Thank you!

You are asking multiple questions. I will address them separately.

That is a lot of global work to do. You will need to divide up your global work into many smaller kernel launches, otherwise your kernels will get terminated.

Letting the runtime pick your local size can be convenient but for optimal performance you should specify the local work size. There is guidance in the vendor documents on the best sizes to use, but it may be easier to just try them all and select the one that is fastest. Also, if your global work sizes are not easily divisible then the runtime will use 1 for the local work size, which is really suboptimal.

There are some rules about this. The global work size must be a multiple of the local work size, so if your local size does not evenly divide, round up the global work size (and then check in your kernel if the get_global_id values are inside the actual work size you want).

Kernels that take advantage of shared local memory often required specific local work sizes to work (unless they are written to be flexible, but that is outside the scope of this posting).

It is also to mention that your Workitemsize is not very good because on GPU, most times the z dimension is the smallest. On my gtx 580, Max workitem Size is {1024,1024,64}. This means, your problem with 4K will be divided to {1,1,64} and if you can change the workitems to {4k, 256, 256} it would be only {4,1,4}. So think about reordering your problem if its possible.


And there is a way to disable the watchdog in the registry. I didn’t do it yet but you sould find it by ecosia’ing (little commercial for my favorite search engin :slight_smile: )