Arrays in the cl program

I am having trouble getting arrays in the cl program without using #define arraysize and making static arrays with array[arraysize].

I know dynamic arrays are not allowed, but is there any way to pass a global constant integer through __kernel? From what I understand __kernel only allows pointers for things with class __global. And if I pass a non global integer as my arraysize, the program tells me it is a dynamic array and thus not allowed.

I have also tried passing pre made arrays through __kernel. However when I do it with __global and __local there are memory sharing problems. I think I need things with __private as the class so that memory is not shared between threads, but __kernel doesn’t allow you to pass __private according to the OpenCL handbook. I am not exactly sure I understand how to use __private either.

From what I understand __kernel only allows pointers for things with class __global.

Kernels can also accept pointers to variables in __constant memory.

I’m not really sure of what you are trying to do. If you are trying to pass constants, either a pointer to __global or a pointer to __constant will do the trick.

Sounds like you’re just talking about a normal global pointer but with per-work item indexing.

If you want each thread to access a separate part of the global pointer, just index it based on the global work id(s).

e.g. if I want every thread to have ‘n’ floats of their own data (work size is 1d):
kernel void
something(global float *globalunshared, int n) {
globalunshared += get_global_id(0) * n;

// thread now can access globalunshared[0-(n-1)] with exclusivity
}

Of course, you must allocate globalunshared = workitems * n in the host code.

If you want each work item to have a different amount of memory, that isn’t much harder:
kernel void
something(global float *globalunshared, global int *globalsizes) {
int n = globalsizes[get_global_id(0)];
// proceed as before
}

Where globalsizes has been initialised previously.

There are lots of other possibilities too, e.g. using atomics to ‘allocate’ a range dynamically during kernel run-time - the only caveat is the host code must pre-allocate a fixed maximum amount ahead of time.

I’m not suggesting this is very efficient or you should want to do it though, but it’s certainly trivially possible. Accessing blocks like this per-thread WILL be inefficient, you should stride them by the work-size (or something) and so on, a `traditional’ allocated block might not work at all well per-thread. Generally speaking memory blocks should be accessed per-workgroup: even if the elements in the block are per-thread specific.

I’ve now been coding in opencl full-time for a over a year and never once considered using ‘private’. It’s just some annotation needed for the compilation process but not needed for development.

Sounds like you’re just talking about a normal global pointer but with per-work item indexing.

If you want each thread to access a separate part of the global pointer, just index it based on the global work id(s).

e.g. if I want every thread to have ‘n’ floats of their own data (work size is 1d):
kernel void
something(global float *globalunshared, int n) {
globalunshared += get_global_id(0) * n;

// thread now can access globalunshared[0-(n-1)] with exclusivity
}

Of course, you must allocate globalunshared = workitems * n in the host code.

If you want each work item to have a different amount of memory, that isn’t much harder:
kernel void
something(global float *globalunshared, global int *globalsizes) {
int n = globalsizes[get_global_id(0)];
// proceed as before
}

Where globalsizes has been initialised previously.

There are lots of other possibilities too, e.g. using atomics to ‘allocate’ a range dynamically during kernel run-time - the only caveat is the host code must pre-allocate a fixed maximum amount ahead of time.

I’m not suggesting this is very efficient or you should want to do it though, but it’s certainly trivially possible. Accessing blocks like this per-thread WILL be inefficient, you should stride them by the work-size (or something) and so on, a `traditional’ allocated block might not work at all well per-thread. Generally speaking memory blocks should be accessed per-workgroup: even if the elements in the block are per-thread specific.

I’ve now been coding in opencl full-time for a over a year and never once considered using ‘private’. It’s just some annotation needed for the compilation process but not needed for development.[/quote]

I think what you are saying is that I should pass in the arrays I want and have them be different using the global id? I have tried doing that, but there are still memory conflict issues because the functions I have in the .cl program. I think it is because I want the arrays to be temorarily used to hold data, but be rewritten with new values every loop of the for loop I have the function call, and passing in arrays doesn’t allow for them to be rewritten? I get incorrect values returned for what is assigned to the first value of the global id, but the not a number symbol for all the others.

What I basically want to do is create arrays that can temporarily hold data for a function in the cl file which is used later on by another function in the cl file. This array needs to be able to be written over because it is holding data for values in a loop and each time the loop goes around, new data should be put into it.
Using #define , I have been able to make a static array by defining the size of the array and the cl program works. However I want to be able to do this without using #define because I want the size of my array to come from a number I create during the reading of the input file process (in main).

Well that just means your addressing is incorrect. It really has to be globally unique for every work item. You can’t re-use the same indices since they could potentially all be run concurrently.

What I basically want to do is create arrays that can temporarily hold data for a function in the cl file which is used later on by another function in the cl file. This array needs to be able to be written over because it is holding data for values in a loop and each time the loop goes around, new data should be put into it.
Using #define , I have been able to make a static array by defining the size of the array and the cl program works. However I want to be able to do this without using #define because I want the size of my array to come from a number I create during the reading of the input file process (in main).

Doing this:
somekernel() {
float somebuffer[SOMESIZE];

callfunction(somebuffer);
}

is completely identical (numerically) to this:

somekernel(global float *globalbuffer, int somesize) {
global float *somebuffer += get_global_id(0) * somesize;

callfunction(somebuffer);
}

except “somesize” can be dynamic as you require.

I suspect what you’re doing is in host code just allocating SOMESIZE and then trying to use it like this:
somekernel(global float *somebuffer, int somesize) {
callfunction(somebuffer);
}

But you need to allocate SOMESIZE * worksize instead, and index based on the global id.

Of course if you posted an example such things would be obvious.