An array arr is passed to the kernel with some data. Inside the kernel I need a temporary array of the same size as arr. How can I allocate it? I tried to pass the size of arr from host to the kernel and to allocate an array with
but the compiler doesn’t accept this because arrSize is not a compile time constant. malloc() is not supported by OpenCL C.
How can I create a temporary array of the same size as an existing one inside a kernel?
option1: recompile the code using a #define to match the problem size, or some problem size limit.
option2: use local memory, and manually make sure each work item is working on it’s own pool (use local work id + index* arrSize] as the index to avoid bank conflicts. only works if you have a limited amount which will fit. You need to allocate arrsize * local work size so that each item has it’s own block.
option3: pass in global memory big enough to fit allocated on the host, and manually make sure each work item is working on it’s own pool. Probably use similar indexing to above so that accesses are coalesced. i.e. you need to allocate arrSize * global work size so that each work item has it’s own block.
option 1 is the easiest if you know the problem is bounded by some reasonable upper limit.
option 3 is the closest to how a runtime implements 1 internally - ‘private arrays’ are just private ranges of global memory.
It depends on how you access it. If you use fixed indices or at least indices which are known at compile time, it should be registerised if it can fit the register file.
If you use dynamic indices or it is too big then yes, it goes into global memory - there’s no where else for it to go.
The only real private memory a gpu has is registers. The closest next thing is local memory which can be used in a private way if you address it properly. I almost always use local memory in this way if I need an internal private array and I have space.
This information is in the various programming guides form the vendors and has been mentioned on forums before. e.g. see section 4.9, page 4-43 of the amd app programming guide 1.3f - much of that is representative of all gpu hardware.