Controlling GPU Contention.

How can I do this? Currently when my global work set gets above a certain size (10000 bodies in my Barnes-Hut nbody sim) my entire PC freezes up. Only the mouse moves, for the duration of the entire iteration (i.e. appears as if all 10000 work items are processed, with no other GPU activity allowed, apparently). Is there any way to allow other GPU operations to be interleaved with my work? Do I have to break up my NDExecuteKernel into a few separate calls? I am leaving work group organisation to OpenCL as it is too much of a pain to do it myself, not sure if maybe this has something to do with it?
OS = Win 7

This is expected and documented behaviour, and it is in general a hardware limitation of most devices - I think the newer hardware (AMD 7xxx, and nvidia stuff) supports `multitasking’, but i’m not sure how complete the driver support is. It’s definitely something all the vendors are looking at addressing fully in the future.

The alternatives to splitting up the code (which may run faster anyway, if the compiler can save registers on smaller loops), are:

  1. make your code run faster so it doesn’t take so long. bad but correct code can be 1-2 orders of magnitude vs tuned code
  2. use a second card for the processing so you don’t notice it.

How is ‘work group organisation’ too much of a pain? All you need to do is choose something that suits your algorithm and hardware (and all gpu hardware uses the same general rules) and then make your problem fit. There are some severe performance penalties for choosing bad work sizes (e.g. prime numbers), and it takes 2 lines of code to select good-known work sizes (e.g. round up to multiples of 64).

Thanks for the info. I did actually try doing my own splitting of the work groups and ran into a few problems, but yes it isn’t difficult to set up (what you said about rounding up the global work size to the next multiple of 64 made me realize why so many kernels I have seen do bounds check on the global id, so thanks for that!).