Max __constant variables defined in program source

CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE can be used to get the max number of arguments declared with the __constant qualifier in a
kernel. Also, max size of a constant buffer can be retrieved…

But is there a limitation for global variables declared in the program source with the __constant qualifier?

I believe you mean the CL_DEVICE_MAX_CONSTANT_ARGS constant that can used when querying a device with clGetDeviceInfo. The spec says the following:


Max number of arguments declared with the __constant qualifier in a kernel. The minimum value is 8.

__constant variables are stored in global memory, that also for example in this region that memory objects point to.

The size of this memory, for a device, is given by CL_DEVICE_GLOBAL_MEM_SIZE (4.2 Querying Devices).

coleb, true, I pasted wrong enum.
matrem, I know constant variables reside in global memory and are cached, I just though maybe there was another limitation. Thanks.

Only the size CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE (minimum of 64KB) and the number of arguments in the kernel definition CL_DEVICE_MAX_CONSTANT_ARGS (minimum of 8).

The way I read the spec though you can have more __constant pointers in your code like the following:

__kernel void MyKernel(__constant float* arg)
  __constant float* ptr = &arg[100];

So if you’re coming up against the CL_DEVICE_MAX_CONSTANT_ARGS limit you can pack your data into one argument and then split it out again on the device into multiple pointers.

My understanding is that a kernel can not use more than the max constant buffer size. At least in PTX I believe this is a separate memory space which has different allocation limits. (I may both be wrong and this may change with future hardware, particularly with Fermi which appears to unify all the address spaces.)

I’m curious what has happened to the constant cache in Fermi. The graphic for the previous generation of SM clearly shows a “C cache”:

While the new SM for the fermi architecture has the “Configurable L1/Shared Memory” plus a “Unified Cache”.

What’s the difference between configured L1 and “Unified Cache”? And if constant caches are a thing of the past do the new caches still have the same problem that two work-items in a warp accessing different addresses will cause a serialization?

Not to mention the question, how does one “configure” the L1 cache? Hopefully this done based on the shared memory needs of the specific kernel being launched. But then again, how does this balance out with the ability to run multiple kernels on the device at once? Or is it still only one kernel per SM at a time?

I’m rambling… I should get back to coding…

My constants are defined in global program source e.g.: __constant int something = 1;
It sucks if I’m limited to use only CL_DEVICE_MAX_CONSTANT_ARGS of these from kernels, because they are not in the kernel parameter list. You think OpenCL automatically sees them as arguments ?

The standard seems to indicate a difference between constant kernel arguments and the __constant variables at program scope. Though it’s not clear what’s the safe way to initialize a program scope constant variable.