hi all

i run my code on cpu device and everything is all right, but when i run it ongpu device i get
error 36 that according to cl.h it corresponding to CL_INVALID_COMMAND_QUEUE
this is a piece of code that have problem:

while (round <= rounds) {

	printf("Round %u...

", (unsigned) round);
error |= clSetKernelArg(kernel, 5, sizeof(cl_uint), &round);
if (error != CL_SUCCESS) {
fprintf(stderr, "ERROR: clSetKernelArg, error code %d
", error);
ok = 0;
goto cleanup;

    error = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_size, &local_size, 0, NULL, &event_execute);
	if (error != CL_SUCCESS) {
		fprintf(stderr, "ERROR: clEnqueueNDRangeKernel, error code %d

", error);
ok = 0;
goto cleanup;

	execution_time += execution_time_msecs(event_execute);

when i run it (with global_size=64,local_size=1) on cpu it works(it goes every 10 rounds) but on gpu i get :
Round 0…
Round 1…
ERROR: clEnqueueNDRangeKernel, error code -36

i suspect that somehow synchronization has problem, then i add clFinish(command_queue) but still not works

any idea??? :roll:

That error usually means that the code has crashed on the gpu - which means you’ve got a bug in your kernel code (or the arguments you’re giving it).

Having it work on a cpu is a good test but it has a totally different execution engine and memory map so bugs manifest themselves differently.


what are different between cpu and gpu in execution kernel,
i mean, when i run kernel on gpu with global_size=64 and local_size=1 and then run it with same
parameter global_size=64 and local_size=1 on cpu what is deferent except “command_queue”.
i was thinking that when i don’t group data (local_size=1) , then there is no deferent between running on cpu and gpu then i have to get same result from both( both cpu and gnu run same kernel).

did i misunderstand something?? :?:

i entirely commented kernel content but nothing changed i still take that error?


when i comment “clFinish(command_queue)” the while loop finish correctly( i can see Round 0 …
till Round10 … in output) but after while loop i get same error “error code -36” but this time
it is relative to “clEnqueueReadBuffer”?

i’am pretty sure that “command_queue” has problem because:
1. i comment content of kernel entirely but i get same error then the problem can’t be of
2. i NULL local_size( workgroup size) till opencl assign it itself and error remain
3. i NULL event_execute and noting change
4. the only option that change between running on cpu( that works correctly) and gpu (that
has error) is command_queue and other options are same for cpu and gnu

then the only error prone option is “command_queue”
but i have no idea what else can i do , because i don’t have any access to command queue and i
don’t know how to debug it?
please help :frowning:

i found something:
actually when we run kernel with global_size=m and local_size=1 on gpu opencl spread kernel
between m “compute unit” that each one has only a work item but we have different scenario on
cpu i think the only option for running kernel on cpu is local_size=1 (the only number that can assign to local_size is 1) maybe with this constraint
we force cpu to run serially ???
am i correct ??