Simple test kernel for particles system behaving strange


I’m trying to get a particle system as a OpenCL/OpenGL interop working. Problem is, that my kernel is behaving strange. As a test I’m just trying to move all points of a grid along the y-axis. The result that I get is, that 1/4 of the points does exactly what I want, but the rest just races out of view in y, x and z directions.

My Kernel looks as following:

__kernel void particles(__global float4* position, float time) 
 	unsigned int i = get_global_id(0);
	float4 pos = position[i];
	pos += (float4)(0.0f, 0.01f, 0.0f, 1.0f);	
	position[i] = pos;

Interesting is that if I change this one line to

pos.y += 0.01f;

which should in my understanding do exactly the same, the grid is still parted in 4 parts and moves in different directions but at the same speed and one part stays in it’s starting position. Does someone maybe know why not all points of the grid behave the same way?

There also seems to be a difference when I change the w-value of pos. I use vec3 values in the host program, so I was under the impression, that changing w-values here wouldn’t make much of a difference, but I seem to be mistaken here. Can someone explane this to me?

My Host Programm looks as following (without all the debugging information)

Particles::Particles(int resolution)

        m_res = res;
	m_modelMatrix = glm::mat4(1.0f);
	m_time = 0.0f;


	glDeleteVertexArrays(1, &m_vaoH);
	glDeleteBuffers(1, &m_vboH[0]);
	glDeleteBuffers(1, &m_vboH[1]);
void Particles::create(int res) {

	for(int z = 0; z < m_res; z++){
		for(int x = 0; x < m_res; x++){
			vertices.push_back(glm::vec3((float)x/(m_res), 0,(float)z/(m_res)));

void Test::createBuffers() {

	glGenVertexArrays(1, &m_vaoH);

	glGenBuffers(2, m_vboH);
	glBindBuffer(GL_ARRAY_BUFFER, m_vboH[0]);
	glBufferData(GL_ARRAY_BUFFER, vertices.size() * sizeof(glm::vec3), &vertices[0], GL_DYNAMIC_DRAW);

	glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);

	glBindBuffer(GL_ARRAY_BUFFER, m_vboH[1]);
	glBufferData(GL_ARRAY_BUFFER, normals.size() * sizeof(glm::vec3), &normals[0], GL_DYNAMIC_DRAW);

	glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, 0);

	glBindBuffer(GL_ARRAY_BUFFER, 0);


void Particles::setUpOpenCL()
	err = cl::Platform::get(&platforms);
	err = platforms[0].getDevices(CL_DEVICE_TYPE_GPU, &devices);

	CGLContextObj kCGLContext = CGLGetCurrentContext();
	CGLShareGroupObj kCGLShareGroup = CGLGetShareGroup(kCGLContext);
	cl_context_properties props[] = {
	    CL_CONTEXT_PROPERTY_USE_CGL_SHAREGROUP_APPLE, (cl_context_properties)kCGLShareGroup, 0
        context = clCreateContext(props, 1,(cl_device_id*)&devices.front(), NULL, NULL, &err);
	cmdQ = cl::CommandQueue(context, devices.front(), 0, &err);

	 std::ifstream sourceFile("src/shaders/");
	 std::string sourceCode(std::istreambuf_iterator<char>(sourceFile), (std::istreambuf_iterator<char>()));
	 cl::Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()+1));
	 program = cl::Program(context, source);
	err =;

	std::cout << "# Build Status: " << program.getBuildInfo<CL_PROGRAM_BUILD_STATUS>(devices[0]) << std::endl;
	std::cout << "# Build Options:	" << program.getBuildInfo<CL_PROGRAM_BUILD_OPTIONS>(devices[0]) << std::endl;
	std::cout << "# Build Log:	 " << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(devices[0]).c_str() << std::endl;

	kernel = cl::Kernel(program, "particles", &err);
	cl_vbos.push_back(cl::BufferGL(context, CL_MEM_READ_WRITE, m_vboH[0], &err));
	cl_vbos.push_back(cl::BufferGL(context, CL_MEM_READ_WRITE, m_vboH[1], &err));

	err = kernel.setArg(0, cl_vbos[0]);

void Particles::runKernel()
	err = cmdQ.enqueueAcquireGLObjects(&cl_vbos, NULL, NULL);

	kernel.setArg(1, m_time);

	err = cmdQ.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(m_res*m_res), cl::NullRange, NULL, NULL);

        err = cmdQ.enqueueReleaseGLObjects(&cl_vbos, NULL, NULL);

void Particles::render()

	glDrawArrays(GL_POINTS, 0, vertices.size());


I hope someone knows what my mistake is. Thanks in advance.

Your OpenGL position is made of 3 consecutive floats, whereas your OpenCL kernel expects a position made of 4 consecutive floats.

So either use a position with 4 floats and pass a stride parameter of 16 to glVertexAttribPointer(), or use a __global float* position as argument to your kernel and make the operation position[3i+1] += 0.01f; (please note that OpenCL float3 has the same size as float4, so using a __global float3 argument wouldn’t help).

Thanks for your answer. I changed the position to vec4, with a 1.0 for points as w-coordinate. The host program as well as the kernel work with vec4/float4 now. The result is different now, but far from how it should be. It now looks like shown in this screenshot:

Any ideas?

Your kernel must not modify the w coordinate. Remember that a 4D-point (x, y, z, w) is equivalent to a 3D-point (x/w, y/w, z/w) (to make it simple).
You can consider (x, y, z, 1) as a point and (x, y, z, 0) as a vector.

A translation transformation adds a vector, so you should add (x, y, z, 0).