Simple question about clFinish()

vincentfpgarcia · February 5, 2013, 12:24am

Hi,

Simple question related to the code bellow. It’s not a real code, it doesn’t compile, it’s just a simple example.


cl_mem image0;
cl_mem image1;
cl_mem image2;
cl_mem buffer;

...

// Step1
f(image0, image1, 1);
f(image0, image2, 2);
g(image1, image2, buffer);

// Step2
f(image0, image1, 3);
f(image0, image2, 4);
g(image1, image2, buffer);

// Step3
f(image0, image1, 5);
f(image0, image2, 6);
g(image1, image2, buffer);

We basically have 3 images allocated on the device. The function f(input, output, p) apply a kernel that fills the output with values read from input given a parameter p. For instance, f could be a Gaussian smoothing where p would be the variance of the Gaussian kernel.

The function g takes two image inputs and a buffer as an output. In g, the kernel analyses the two inputs and write something in the output buffer. For instance, g could detect the local maxima in both inputs.

Because we apply the “algorithm” 3 times here (3 steps) and because we re-use the same memory space at each step (image0 and image1, buffer grows at each step), I was thinking that maybe I should use a clFinish() between each step. I’m affraid that if I don’t, the step 2 may start before step 1 is finished which would lead to an incorrect behavior of function g in step 1.

What do you think?

Thanks!

matthiasv · February 5, 2013, 12:40am

I think, you should have a look into OpenCL events, setup you task graph with in-order queues and do a blocking read at the end. But you should also read what Intel has to say about explicit synchronization on CPUs.

clint3112 · February 5, 2013, 1:18am

Do you append data to buffer or do you want to write the same location in buffer?

vincentfpgarcia · February 5, 2013, 5:05am

I append data to the buffer using function g at the end of each step. I’m afraid that the blocking read at the end won’t be enough but I’m not sure, hence my question.

matthiasv · February 5, 2013, 5:22am

If you set up everything correctly using events, the last blocking read will be sufficient. It’s just necessary to start the computation.

EDIT: Actually, if you use a single, in-order queue events aren’t even necessary because the enqueued commands will be executed in order of submission and wait until all preceeding commands are finished.

vincentfpgarcia · February 5, 2013, 5:46am

If I understand what you are saying, if I use a simple command queue (i.e. I don’t specify CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ ENABLE), then all what I put in the queue will be execute in order and one command will be executed only when the previous one is finished? If yes, then it’s perfect, I don’t have anything more to do. Can you confirm?

matthiasv · February 5, 2013, 5:59am

The standard is clear on this (emphasize mine):

If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a command-queue is not set, the commands enqueued to a command-queue execute in order. For example, if an application calls clEnqueueNDRangeKernel to execute kernel A followed by a clEnqueueNDRangeKernel to execute kernel B, the application can assume that kernel A finishes first and then kernel B is executed. If the memory objects output by kernel A are inputs to kernel B then kernel B will see the correct data in memory objects produced by execution of kernel A. If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a commandqueue is set, then there is no guarantee that kernel A will finish before kernel B starts execution.

vincentfpgarcia · February 5, 2013, 7:56am

Yeap, thank you!