Hello Everyone ! I do have a integrated GPU of intel which is having 4 cores and when I execute device query it shows that it is having 16 compute units. But for Nvidia’s GPU GeForce GTX 260 device query shows that it is having 24 compute units but it have 192 cores. So Can anyone tell me the relation between cuda cores and compute units?
NVIDIA people really likes to re-define terminology and create a confusion / marketing buzz …
Here’s my explanation, hope it helps.
GTX260 is based on GT200. Anandtech also good article on it here : http://www.anandtech.com/show/2549/2
The equivalent to a “CPU core” (or to what OpenCL defines as Compute Unit) in NVIDIA GPUs in this architecture is the SM - Stream Multiproccesor. Each Stream Multiprocessor has a vector unit of 8 SP - Stream Processors. the SP is what NVIDIA refers as “cuda core”, although quite misleading as these SM’s are SIMD architecture - there’s one program counter to all 8 (actually to 32 - WARP size, which is the logical vector width).
So, GTX 260 has 24 SMs, each has 8 SPs - this counts up to overall 192 SPs on die, which NVIDIA refers as 192 cuda cores. From OpenCL perspective, there are 24 Compute Units, as its looking at the number of SMs.
“cuda core” is one ALU inside the vector unit.