Fail at ping-pang scheme buffer usage... assuming Image buffer is read before enqueued kernels are completed

Hi, I am still very new to GPU and opencl.

It looks like my code have a problem with memory ordering(I assume…).

I have some filters in separate kernels implemented and am using two image buffers in a ping-pang scheme as in and output of the kernels. However some of the kernels are ‘ignored’(only ignored to read the output in the correct order, but still executed based on the debug output.) What should I do or which kind of condition should I set to synchronize it?

Kernel setup code looks like: where InBuffer and OutBuffer are also image buffers in the format of

const cl::ImageFormat format(CL_RGBA, CL_UNSIGNED_INT8);

    const cl::ImageFormat format(CL_RGBA, CL_HALF_FLOAT);
    YUVImg1 = cl::Image2D(context, CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS, format, m_width, m_height, 0, 0);
    YUVImg2 = cl::Image2D(context, CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS, format, m_width, m_height, 0, 0);
    DeNoiseImage = cl::Image2D(context, CL_MEM_READ_WRITE, format, m_width, m_height, 0, 0);

    R2Y_kernel = cl::Kernel(program, "rgba2yuv");
    R2Y_kernel.setArg(0, InBuffer);
    R2Y_kernel.setArg(1, YUVImg2);

    Median_kernel = cl::Kernel(program, "MedianFilter");
    Median_kernel.setArg(0, YUVImg2);
    Median_kernel.setArg(1, YUVImg1);

    HPass_x_kernel = cl::Kernel(program, "HighPass_x");
    HPass_x_kernel.setArg(0, YUVImg1);
    HPass_x_kernel.setArg(1, YUVImg2);

    HPass_y_kernel = cl::Kernel(program, "HighPass_y");
    HPass_y_kernel.setArg(0, YUVImg2);
    HPass_y_kernel.setArg(1, YUVImg1);

    Y2R_kernel = cl::Kernel(program, "yuv2rgba");
    Y2R_kernel.setArg(0, YUVImg1);
    Y2R_kernel.setArg(1, OutBuffer);
    Y2R_kernel.setArg(2, DeNoiseImage);

and here comes the command queue code:

    cl::size_t<3> origin;
    origin[0] = 0;
    origin[1] = 0;
    origin[2] = 0;
    cl::size_t<3> region;
    region[0] = m_width;
    region[1] = m_height;
    region[2] = 1;

    commandQueue.enqueueNDRangeKernel(
                R2Y_kernel,
                cl::NullRange,
                cl::NDRange(m_width, m_height),
                cl::NDRange(m_blockSizeX, m_blockSizeY),
                NULL);

    commandQueue.enqueueNDRangeKernel(
                Median_kernel,
                cl::NullRange,
                cl::NDRange(m_width, m_height),
                cl::NDRange(16, 4),
                NULL);

    commandQueue.enqueueNDRangeKernel(
                HPass_x_kernel,
                cl::NullRange,
                cl::NDRange(m_width, m_height),
                cl::NDRange(m_blockSizeX, m_blockSizeY),
                NULL);

    commandQueue.enqueueNDRangeKernel(
                HPass_y_kernel,
                cl::NullRange,
                cl::NDRange(m_width, m_height),
                cl::NDRange(m_blockSizeX, m_blockSizeY),
                NULL);

    commandQueue.enqueueNDRangeKernel(
                Y2R_kernel,
                cl::NullRange,
                cl::NDRange(m_width, m_height),
                cl::NDRange(m_blockSizeX, m_blockSizeY),
                NULL);

Current output are unfiltered image frames. However according to the debug message the command queues are executed in queued order.

The three filter kernels Median_kernel, HPass_x_kernel and HPass_y_kernel are working properly if I enqueue only one of them and commented the other two so it should not be any problem from the kernel code.

Also I tried to use different new image buffers as in/ output for each kernel and it also works. But to me it will be a stupid way to solve this problem:(

Another strange thing is that in the main calculation chain for other kernels(also image filters) I was using the same ping-pang scheme and it just works fine. The only difference is that almost each other filters has its identical class created in the host code however the same (read and write)image buffer pointers are passed to the classes so it should still be the same concept:(

Hardware info:
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 2874.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 

Thank you so much for the incoming help… I was stuck here for few days already. :cry::cry: