Mandelbrot: Convert CUDA code to OpenCL code

Hi guys!

I’ve an assignment where I need to convert a mandelbrot program written without OpenCL to a program that uses OpenCL. My professor only started to talk about OpenCL the penultimate lecture, and in the last lecture we had already to write a mandelbrot program, which is intrinsically (even without using OpenCL), in my opinion, not trivial at all.

I’ve the corresponding program (of the original one that I need to convert) written for CUDA, but I’m also not familiar with CUDA and in general with any kind of these tools.

I think my professor exaggerated a little bit this time, and I’m completely lost.

Starting from the kernel written in CUDA, which is the following:

__global__ void
mandelbrot_kernel(unsigned char* pixels, int width, int height, int channels,
                  int rowstride, double re_min, double im_min,
                  double resolution, unsigned limit)

  int x = blockIdx.x * blockDim.x + threadIdx.x;
  int y = blockIdx.y * blockDim.y + threadIdx.y;
  if (x > width || y > height)

  pixels += y * rowstride + x * channels;
  double c_re = re_min + x * resolution;
  double c_im = im_min + y * resolution;

  unsigned rounds;
  double z_re = 0;
  double z_im = 0;

  for (rounds = limit - 1; rounds > 0; --rounds) {

    double z_re_tmp = z_re * z_re - z_im * z_im + c_re;

    z_im = 2 * z_re * z_im + c_im;
    z_re = z_re_tmp;

    if (z_re * z_re + z_im * z_im >= 4.0) {
      rounds = (limit - rounds) * 256 / limit;

  for (int c = 0; c < channels; ++c)
    pixels[c] = rounds;

I don’t understand exactly for example the meaning of the first two lines:

  int x = blockIdx.x * blockDim.x + threadIdx.x;
  int y = blockIdx.y * blockDim.y + threadIdx.y;

From my understanding threads are work-items in OpenCL (which are grouped together in workgroups). What are blocks? Are blocks workgroups in OpenCL? How would I translate the previous code to the corresponding OpenCL code?

int x = blockIdx.x * blockDim.x + threadIdx.x;
equates to
get_group_id(0) * get_local_size(0) + get_local_id(0) or simply

Correspondingly, int y = get_global_id(1)

Another thing to watch for: In CUDA the host specified the block size and number of blocks. In OpenCL you specify the global size and optionally the block (workgroup) size.