A simple technical question about memory access.

MaximS · May 1, 2012, 9:48am

Hi all,

let’s assume the following kernel:


#define nx      (signed)get_global_id(0)
#define ny      (signed)get_global_id(1)
#define Nx      (signed)get_global_size(0)
#define Ny      (signed)get_global_size(1)

__kernel void parallelSum(__global float* matrix, __global float* sum)
{
    sum[0] += matrix[nx + Nx * ny];
}

All work items have to access sum[0] at some point. But this can happen only sequentially. So actually there is not much parallelization in this example, right?

jam383 · May 1, 2012, 12:30pm

It wont work as intended although it wont throw out any errors. I imagine its because without synchronization of some sort the sum variable is not synced and hence doesnt get updated in anything nearing sequential.

MaximS · May 2, 2012, 9:26am

Actually it does. I thought OpenCL takes care of synchronization in such cases. Am I wrong?

MaximS · May 2, 2012, 1:42pm

You’re right. It doesn’t work like I thought. Thanks.

Dithermaster · August 12, 2012, 7:28am

This is a reduction operation. There are ways to parallelize reductions, but they are obviously more complex than doing it serially. Search for “OpenCL parallel reduction” for ideas. “Divide and conquer” and use fast shared memory for partial results. Save the atomic operations for the last step.