I am having an OpenCL program whose kernel takes (_global int4* array) as an input. In the host code, I have a (std::vector<cl_int> vec) which will be passed to the kernel function for processing.
My question is: Is there a way to enqueue the kernel such that each work-item will process multiple elements from vec?
For instance:
If the input vector’s size is defined as 32 in the host code. So std::vector<cl_int> vec(32);
Is there a way to specify that I want to process 8 adjacent elements in vec in each instance of the kernel, as in the code below?
kernel(_global int4* array){
int i = get_global_id(0);
//I want to use 8 elements that are processed in a work-item here
//note that int8 = (int4, int4)
int8 result = (int8) (array[i], array[i+1]);
}
I am new to OpenCL so any help is deeply appreciated.