GTX 560 - How much work-items i set?


I’m a bit lost to know how much of work-items i have to put to run my application.
Cause, the concept of processing elements is different in AMD and Nvidia GPUs (Cuda Core and Stream Cores).

In a GTX 560 specifications says that have 336 Cuda Cores, this means that i have to put 336 work-items to have a full possibility and power of paralelization, of this GPU?


Usually the number of work items is determined by the problem. i.e. just use 1 work item per output pixel or whatever.

Otherwise a reasonable general rule of thumb for all gpus is Nx64 work items per compute unit, where N is 4-16 (or more) - the ideal N is dependent on the code you’re running as much as it is the architecture it’s running on, so just experiment.

You must have many threads per core in order to hide memory latency and achieve high parallelism of the alus. But if the code is ALU bound, this isn’t so important.


Very thanks for your answer.

When i read a doubt emerged.

In a GTX 560, for instance, have 7 Compute Units, so i will create only 7 Work-Groups with more work-items inside?
Or can i create more work-groups with less work-items, provided that the number of work-groups is a multiple of 7?

Very Thanks,

My answer covered these cases.

Apart from general answers: You have that device, you know what you’re coding - it is up to you to see what works for your case and to experiment to see if improvements can be made.


It’s not clear yet.

Cause, when if work with CPU, always have to put the number of threads equal a number of Processing Elements. If you put more will generate concurrency. And, in CPU generally is 2 potency a number or processing elements, so when you split your workload you will split in a number that is a 2 potency.

In the GPU, get the GTX 560 for instance that have 7 compute units, will follow the same pattern?