hi,
i found a strange behaviour debuging kernel.
I use this
struct my_debIdx {
int tableau[1];
};
static int* debIdx[1] = {0};
my_debIdx debIdx;
for retreiving the number of time i set a spécial value in out[global index].x
I create the GPU buffer
debugIdx = cl::Buffer(gContext, CL_MEM_READ_WRITE|CL_MEM_USE_HOST_PTR, 1*sizeof(int), debIdx, NULL);
i pass the debug buffer to the kernel
__global int* __restrict__ debugIdx
and each time i set out[global index].x count it.
(*debugIdx)++;
then a retreive the information using
gQueue.enqueueReadBuffer(debugIdx, CL_TRUE, 0, 1*sizeof(int), debIdx.tableau);
LOGI(" debug0: indIdx value %5d \n",debIdx.tableau[0]);
// réinit the buffer to zéro for anaother use.
debIdx.tableau[0] = 0;
gQueue.enqueueWriteBuffer(debugIdx, CL_TRUE, 0, 1*sizeof(int), debIdx.tableau);
I work with cl::NDRange(1024,1024), cl::NDRange(2,2) so the number of work_group to be processed is (1024 / 2)^2 = 512^2 = 262144 work_group.
When i work with smal buffer (36 * 36) everything are good but using (1024 * 1024) it look like the out number is a lot smaller than what i get when i retreive the data later.
that is what i get from the kernel
debug0: indIdx value 1601
and what i retreive form the output kernel buffer passed to CPU
void Extraction_Point: buf.bufligne Rouge: 25467
so kernel give 1601 and extracting the value from the output buffer give 25467 and it is the good value no doubt.
May be i made an error somewhere ?