Kernel compilation problem....

Craig · October 29, 2010, 2:17am

I’m having a weird problem when trying to create a program from source code…

This code works just fine in the program I’ve been working on:

const char * source = "__kernel void render(__global float *d_output, __global float *vertTemplate,\
__global float *normTemplate, __global float *offTemplate,\
__global unsigned int *cubeCoords,__global unsigned int *cubeCount)\
{\
unsigned int x = get_global_id(0);\
unsigned int y = get_global_id(1);\
float newCoord = (*(offTemplate+((x*3)+(y%3))))+(*(vertTemplate+y));\
unsigned int i = ((*(cubeCoords))*x)+y;\
*(d_output+i) = newCoord;\
*(d_output+(i+(((*(cubeCount))*(*(cubeCoords)))))) = (*(normTemplate+y));\
}\
}";

but this code (the same exact code but with array subscripts) doesn’t work:

const char * source = "__kernel void render(__global float *d_output, __global float *vertTemplate,\
__global float *normTemplate, __global float *offTemplate,\
__global unsigned int *cubeCoords,__global unsigned int *cubeCount)\
{\
unsigned int x = get_global_id(0);\
unsigned int y = get_global_id(1);\
float newCoord = offTemplate[(x*3)+(y%3)]+vertTemplate[y];\
unsigned int i = (cubeCoords*x)+y;\
d_output[i] = newCoord;\
d_output[i+(cubeCount*cubeCoords)] = normTemplate[y];\
}\
}";

Same happens when I change other things too such as an unsigned int to a uint. I can’t use functions like barrier() inside the source kernel either… Any ideas as to why I’m getting these errors?

BTW, I’m a total beginner with OpenCL. I’m also using Visual C++ 2008 Express Edition

Craig · October 29, 2010, 3:27am

Minor editing error. I forgot to take out a curly brace that I already fixed before posting.

}
}";

is supposed to be

}";

in both of those kernels, but the same problem persists.

HolyGeneralK · October 29, 2010, 7:04am

A few questions to get us going…

Are you using AMD’s or NVIDIA’s OpenCL implementation?

What error code is the compilation returning?

Have you attempted to read the build log of the kernel?

david.garcia · October 29, 2010, 8:46am

What HolyGeneralK said plus the following. This code:

*(d_output+(i+(((*(cubeCount))*(*(cubeCoords)))))) = (*(normTemplate+y));\

Doesn’t do the same as:

d_output[i+(cubeCount*cubeCoords)] = normTemplate[y];\

Do you realize that both cubeCount and cubeCoords are pointers?

Craig · October 29, 2010, 9:38am

Not sure what you’re asking here, but I’ll give it my best shot:

[ul]
[li]I have an NVIDIA GeForce GT 240 graphics card[/13uur6yk][/li][li]I am only including the cl.h header file along with glew.h, glut.h, and SDL.h[/13uur6yk][/li][li]NOT using the “ocl” API’s in the oclUtils.h header file which I’ve heard is NVIDIA related[/13uur6yk][/li][li]I downloaded the current header files I’m using from Khronos OpenCL Registry - The Khronos Group Inc under the OpenCL 1.0 section.[/13uur6yk][/ul][/li]

CL_INVALID_PROGRAM_EXECUTABLE on the clCreateKernel() API

I tried re-creating the problem after reading this question, but as I was re-creating the problem, I stumbled on a syntax error in my kernel… I was using the address when I should have used the value the address pointed to.

Example: When I was removing the pointer notation from all the arrays I also removed the * before one of my pointers. So *(cubeCoords) became cubeCoords.

As for not being able to use barrier(), I was looking at an out dated pdf (or a pdf with a typo) and it was saying to use barrier(GLOBAL_MEM_FENCE) instead of barrier(CLK_GLOBAL_MEM_FENCE). And the uint works just fine now…

I’m now using a 3D NDRange to calculate the vertex positions for all cube vertices on a 16 X 16 grid of cubes and it’s finally starting to work the way I want it to. Only problem I’m having now is that only an 8X8 grid of cubes is showing up instead of a 16X16 grid. I’m sure I’ll be able to figure it out though, I’ve gotten this far in less than a week.

Craig · October 29, 2010, 10:26am

Figured it out and I now have a 16X16 grid of cubes. Just had to change two lines of code:

From:
unsigned int x = get_global_id(0);
unsigned int y = get_global_id(1);

To:
int x = (int)get_global_id(0);
int y = (int)get_global_id(1);

Thanks guys for your help and your time

kishoreraj05 · November 3, 2010, 1:22pm

Hi Friends,

Well I beginner in OpenCL…

I am trying to sum a list of num… But I am getting Error…

Error: clBuildProgram(-11)

Please, See my code and help me to Solve it…

<code cpp>
#include <iostream>
#include <cstdlib>
#include <fstream>
#include <string>
//#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>

#ifdef APPLE
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

#define MAT_SIZE 4096

using namespace std;

void err_check( int err, string err_code ) {
if ( err != CL_SUCCESS ) {
cout << "Error: " << err_code << “(” << err << “)” << endl;
exit(-1);
}
}

int main()
{
cl_platform_id platform_id = NULL;
cl_device_id device_id = NULL;
cl_context context = NULL;
cl_command_queue command_queue = NULL;
cl_program program = NULL;
cl_kernel kernel = NULL;
cl_uint ret_num_devices;
cl_uint ret_num_platforms;
cl_int err;
float mat_a[ MAT_SIZE ];
for ( cl_int i = 0; i < MAT_SIZE; i++ ) {
mat_a[i] = i;
}

    // Step 01: Get platform/device information
    err = clGetPlatformIDs( 1, &platform_id, &ret_num_platforms );
    err_check( err, "clGetPlatformIDs" );

    // Step 02: Get information about the device
    err = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &ret_num_devices );
    err_check( err, "clGetDeviceIDs" );

    // Step 03: Create OpenCL Context
    context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &err );
    err_check( err, "clCreateContext" );

    // Step 04: Create Command Queue
    command_queue = clCreateCommandQueue( context, device_id, 0, &err );
    err_check( err, "clCreateCommandQueue" );

    // Step 05: Create memory objects and tranfer the data to memory buffer
    cl_mem idata, odata;
	idata = clCreateBuffer( context, CL_MEM_READ_ONLY, MAT_SIZE * sizeof(float), NULL, &err );
    err = clEnqueueWriteBuffer( command_queue, idata, CL_TRUE, 0, MAT_SIZE * sizeof(float), mat_a, 0, NULL, NULL );
    err_check( err, "clEnqueueWriteBuffer" );
	
	odata = clCreateBuffer(context, CL_MEM_READ_WRITE, ( (MAT_SIZE/256)*sizeof(float) ), NULL, &err);
	
	 // Step 06: Read kernel file
    ifstream file("par_sum_kernel.cl");
    string prog( istreambuf_iterator&lt;char&gt;( file ), ( istreambuf_iterator&lt;char&gt;() ) );
    const char *source_str = prog.c_str();
	        // Step 07: Create Kernel program from the read in source
    program = clCreateProgramWithSource( context, 1, (const char **) &source_str, 0, &err );
    err_check( err, "clCreateProgramWithSource" );

    // Step 08: Build Kernel Program
    err = clBuildProgram( program, 1, &device_id, NULL, NULL, NULL );
    err_check( err, "clBuildProgram" );

    // Step 09: Create OpenCL Kernel
    kernel = clCreateKernel( program, "sum", &err ); 
	err_check( err, "clCreateKernel" );

    // Step 10: Set OpenCL kernel argument
    err = clSetKernelArg( kernel, 0, sizeof( cl_mem ), (void *) &idata );
	err_check( err, "clSetKernelArg" );

	err = clSetKernelArg( kernel, 1, sizeof( cl_mem ), (void *) &odata );
	err_check( err, "clSetKernelArg" );

    // Step 11: Execute OpenCL kernel in data parallel
    size_t GWsize[] = { MAT_SIZE, 1, 1 };
	size_t LWsize[] = {256 , 1, 1};
    clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, GWsize, LWsize, 0, 0, 0 );
    err_check( err, "clEnqueueNDRangeKernel" );

//-----------------
err = clSetKernelArg( kernel, 0, sizeof( cl_mem ), (void *) &idata );
err_check( err, “clSetKernelArg” );

	err = clSetKernelArg( kernel, 1, sizeof( cl_mem ), (void *) &odata );
	err_check( err, "clSetKernelArg" );
	LWsize[0] = (MAT_SIZE/256);
	GWsize[0] = 1;

	clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, GWsize, LWsize, 0, 0, 0 );
    err_check( err, "clEnqueueNDRangeKernel" );

//--------------------
// Step 12: Read (Transfer result) from the memory buffer
float mat_b[LWsize[0]];
err = clEnqueueReadBuffer( command_queue, odata, CL_TRUE, 0, MAT_SIZE * sizeof(float), mat_b, 0, NULL, NULL );

    // Step 13: Free objects
    err = clFlush( command_queue );
    err = clFinish( command_queue );
    err = clReleaseKernel( kernel );
    err = clReleaseProgram( program );
    err = clReleaseMemObject( idata );
    err = clReleaseMemObject( odata );

  	err = clReleaseCommandQueue( command_queue );
    err = clReleaseContext( context );

    // Display result
	cout&lt;&lt;mat_b[0];

    return 0;

}
</code >

Kernel Code:
<code cpp>
__kernel void sum( __global float *idata, __global float *odata )
{
int gid = get_global_id(0);
int lid = get_local_id(0);
int bid = get_group_id(0);

	__local float sdata[get_num_groups(0)];
	sdata[lid] = idata[gid];
	barrier(CLK_LOCAL_MEM_FENCE);

	for( int dist = get_local_size(0); dist&gt;0; dist/=2 )
	{
		if(lid &lt; dist){
			sdata[lid] += sdata[lid + dist];
			barrier(CLK_LOCAL_MEM_FENCE);
		}
	}
	if(lid == 0)
	odata[bid] += sdata[0];

}

</code>

kishoreraj05 · November 3, 2010, 1:29pm

i Friends,

Well I beginner in OpenCL…

I am trying to sum a list of num… But I am getting Error…

Error: clBuildProgram(-11)

Please, See my code and help me to Solve it…

#include <iostream>
#include <cstdlib>
#include <fstream>
#include <string>
//#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>

#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
 #endif


#define MAT_SIZE 4096

using namespace std;

void err_check( int err, string err_code ) {
        if ( err != CL_SUCCESS ) {
                cout << "Error: " << err_code << "(" << err << ")" << endl;
                exit(-1);
        }
}

int main()
{
        cl_platform_id platform_id = NULL;
        cl_device_id device_id = NULL;
        cl_context context = NULL;
        cl_command_queue command_queue = NULL;
        cl_program program = NULL;
        cl_kernel kernel = NULL;
        cl_uint ret_num_devices;
        cl_uint ret_num_platforms;
        cl_int err;
 float mat_a[ MAT_SIZE ];
        for ( cl_int i = 0; i < MAT_SIZE; i++ ) {
                mat_a[i] = i;
        }

        // Step 01: Get platform/device information
        err = clGetPlatformIDs( 1, &platform_id, &ret_num_platforms );
        err_check( err, "clGetPlatformIDs" );

        // Step 02: Get information about the device
        err = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &ret_num_devices );
        err_check( err, "clGetDeviceIDs" );

        // Step 03: Create OpenCL Context
        context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &err );
        err_check( err, "clCreateContext" );

        // Step 04: Create Command Queue
        command_queue = clCreateCommandQueue( context, device_id, 0, &err );
        err_check( err, "clCreateCommandQueue" );

        // Step 05: Create memory objects and tranfer the data to memory buffer
        cl_mem idata, odata;
		idata = clCreateBuffer( context, CL_MEM_READ_ONLY, MAT_SIZE * sizeof(float), NULL, &err );
        err = clEnqueueWriteBuffer( command_queue, idata, CL_TRUE, 0, MAT_SIZE * sizeof(float), mat_a, 0, NULL, NULL );
        err_check( err, "clEnqueueWriteBuffer" );
		
		odata = clCreateBuffer(context, CL_MEM_READ_WRITE, ( (MAT_SIZE/256)*sizeof(float) ), NULL, &err);
		
		 // Step 06: Read kernel file
        ifstream file("par_sum_kernel.cl");
        string prog( istreambuf_iterator<char>( file ), ( istreambuf_iterator<char>() ) );
        const char *source_str = prog.c_str();
		        // Step 07: Create Kernel program from the read in source
        program = clCreateProgramWithSource( context, 1, (const char **) &source_str, 0, &err );
        err_check( err, "clCreateProgramWithSource" );

        // Step 08: Build Kernel Program
        err = clBuildProgram( program, 1, &device_id, NULL, NULL, NULL );
        err_check( err, "clBuildProgram" );

        // Step 09: Create OpenCL Kernel
        kernel = clCreateKernel( program, "sum", &err ); 
		err_check( err, "clCreateKernel" );

        // Step 10: Set OpenCL kernel argument
        err = clSetKernelArg( kernel, 0, sizeof( cl_mem ), (void *) &idata );
		err_check( err, "clSetKernelArg" );

		err = clSetKernelArg( kernel, 1, sizeof( cl_mem ), (void *) &odata );
		err_check( err, "clSetKernelArg" );

        // Step 11: Execute OpenCL kernel in data parallel
        size_t GWsize[] = { MAT_SIZE, 1, 1 };
		size_t LWsize[] = {256 , 1, 1};
        clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, GWsize, LWsize, 0, 0, 0 );
        err_check( err, "clEnqueueNDRangeKernel" );
//-----------------		
		 err = clSetKernelArg( kernel, 0, sizeof( cl_mem ), (void *) &idata );
		err_check( err, "clSetKernelArg" );

		err = clSetKernelArg( kernel, 1, sizeof( cl_mem ), (void *) &odata );
		err_check( err, "clSetKernelArg" );
		LWsize[0] = (MAT_SIZE/256);
		GWsize[0] = 1;

		clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, GWsize, LWsize, 0, 0, 0 );
        err_check( err, "clEnqueueNDRangeKernel" );

//--------------------
		// Step 12: Read (Transfer result) from the memory buffer
		float mat_b[LWsize[0]];
        err = clEnqueueReadBuffer( command_queue, odata, CL_TRUE, 0, MAT_SIZE * sizeof(float), mat_b, 0, NULL, NULL );

        // Step 13: Free objects
        err = clFlush( command_queue );
        err = clFinish( command_queue );
        err = clReleaseKernel( kernel );
        err = clReleaseProgram( program );
        err = clReleaseMemObject( idata );
        err = clReleaseMemObject( odata );

	  	err = clReleaseCommandQueue( command_queue );
        err = clReleaseContext( context );

        // Display result
		cout<<mat_b[0];
	
        return 0;
}

I Think there having error in kernel file but I am not able to find out, please help some one.

__kernel void sum( __global float *idata, __global float *odata )
{
        int gid = get_global_id(0);
		int lid = get_local_id(0);
		int bid = get_group_id(0);
		
		__local float sdata[get_num_groups(0)];
		sdata[lid] = idata[gid];
		barrier(CLK_LOCAL_MEM_FENCE);

		for( int dist = get_local_size(0); dist>0; dist/=2 )
		{
			if(lid < dist){
				sdata[lid] += sdata[lid + dist];
				barrier(CLK_LOCAL_MEM_FENCE);
			}
		}
		if(lid == 0)
		odata[bid] += sdata[0];
 }

david.garcia · November 3, 2010, 5:17pm

The -11 error code value you get from clBuildProgram is called CL_BUILD_PROGRAM_FAILURE. That usually means that the source code you passed to clCreateProgramWithSource() is invalid.

Can you show us the contents of par_sum_kernel.cl?

kishoreraj05 · November 3, 2010, 11:15pm

Hello David,

Very very thank you…

As you asked for par_sum_kernel.cl

__kernel void sum( __global float *idata, __global float *odata )
{
        int gid = get_global_id(0);
      int lid = get_local_id(0);
      int bid = get_group_id(0);
      
      __local float sdata[get_num_groups(0)];
      sdata[lid] = idata[gid];
      barrier(CLK_LOCAL_MEM_FENCE);

      for( int dist = get_local_size(0); dist>0; dist/=2 )
      {
         if(lid < dist){
            sdata[lid] += sdata[lid + dist];
            barrier(CLK_LOCAL_MEM_FENCE);
         }
      }
      if(lid == 0)
      odata[bid] += sdata[0];
}

kishoreraj05 · November 3, 2010, 11:43pm

Hi David,

I have tried to find error using clGetProgramBuildInfo()

I have got this error message:

<program source>:7:17: error: automatic variable qualified with an address space
__local float sdata[get_num_groups(0)];

__kernel void sum( __global float *idata, __global float *odata )
{
        int gid = get_global_id(0);
		int lid = get_local_id(0);
		int bid = get_group_id(0);
		
		__local float sdata[get_num_groups(0)]; //Line No: 7
		sdata[lid] = idata[gid];
		barrier(CLK_LOCAL_MEM_FENCE);

		for( int dist = get_local_size(0); dist>0; dist/=2 )
		{
			if(lid < dist){
				sdata[lid] += sdata1[lid + dist];
				barrier(CLK_LOCAL_MEM_FENCE);
			}
		}
		if(lid == 0)
		odata[bid] += sdata[0];
 }

Please reply…
Thank You!

kishoreraj05 · November 3, 2010, 11:44pm

one thing more its pointing to

-> sdata

kishoreraj05 · November 4, 2010, 12:11am

Hi David,
I have made some changes, now I am getting this Error message…

Error: Arg2: clSetKernelArg(-51)

Partial part of host code Host:

#include <iostream>
#include <cstdlib>
#include <fstream>
#include <string>
//#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>

#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
 #endif


#define MAT_SIZE 4096

using namespace std;

void err_check( int err, string err_code ) {
        if ( err != CL_SUCCESS ) {
                cout << "Error: " << err_code << "(" << err << ")" << endl;
                exit(-1);
        }
}

int main()
{
        cl_platform_id platform_id = NULL;
        cl_device_id device_id = NULL;
        cl_context context = NULL;
        cl_command_queue command_queue = NULL;
        cl_program program = NULL;
        cl_kernel kernel = NULL;
        cl_uint ret_num_devices;
        cl_uint ret_num_platforms;
        cl_int err;
 float mat_a[ MAT_SIZE ];
        for ( cl_int i = 0; i < MAT_SIZE; i++ ) {
                mat_a[i] = i;
        }

        // Step 01: Get platform/device information
        err = clGetPlatformIDs( 1, &platform_id, &ret_num_platforms );
        err_check( err, "clGetPlatformIDs" );

        // Step 02: Get information about the device
        err = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &ret_num_devices );
        err_check( err, "clGetDeviceIDs" );

        // Step 03: Create OpenCL Context
        context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &err );
        err_check( err, "clCreateContext" );

        // Step 04: Create Command Queue
        command_queue = clCreateCommandQueue( context, device_id, 0, &err );
        err_check( err, "clCreateCommandQueue" );

        // Step 05: Create memory objects and tranfer the data to memory buffer
        cl_mem idata, odata;
		idata = clCreateBuffer( context, CL_MEM_READ_ONLY, MAT_SIZE * sizeof(float), NULL, &err );
        err = clEnqueueWriteBuffer( command_queue, idata, CL_TRUE, 0, MAT_SIZE * sizeof(float), mat_a, 0, NULL, NULL );
        err_check( err, "idata: clEnqueueWriteBuffer" );
		
		odata = clCreateBuffer(context, CL_MEM_READ_WRITE, ( (MAT_SIZE/256)*sizeof(float) ), NULL, &err);
		err_check( err, "odata: clEnqueueWriteBuffer" );

		cl_mem sdata;
		sdata = clCreateBuffer( context, CL_MEM_READ_WRITE, ( (MAT_SIZE/256)*sizeof(float) ), NULL, &err );
		err_check( err, "sdata: clEnqueueWriteBuffer" );

	//	cout<<sizeof(cl_mem)<<"  "<<(MAT_SIZE/256);
		 // Step 06: Read kernel file
        ifstream file("par_sum_kernel.cl");
        string prog( istreambuf_iterator<char>( file ), ( istreambuf_iterator<char>() ) );
        const char *source_str = prog.c_str();
		        // Step 07: Create Kernel program from the read in source
        program = clCreateProgramWithSource( context, 1, (const char **) &source_str, 0, &err );
        err_check( err, "clCreateProgramWithSource" );

        // Step 08: Build Kernel Program
        err = clBuildProgram( program, 1, &device_id, NULL, NULL, NULL );
		if (err = CL_BUILD_PROGRAM_FAILURE)
		{
			//	cout << "CL_BUILD_PROGRAM_FAILURE" ;
			size_t build_log_size=sizeof(char)*900;
			char * build_log=  new char[900];
			size_t *build_log_ret;
			err =  clGetProgramBuildInfo(program,device_id,CL_PROGRAM_BUILD_LOG,build_log_size,build_log,build_log_ret);
		//	err_check( err, "clGetProgramBuildInfo" );
			cout<<(build_log)<<endl;
/*
			for(int i=0;i<(*build_log_ret)/sizeof(char);i++){
				cout << build_log[i];
			}
*/
		}
        err_check( err, "clBuildProgram" );

        // Step 09: Create OpenCL Kernel
        kernel = clCreateKernel( program, "sum", &err ); 
		err_check( err, "clCreateKernel" );

        // Step 10: Set OpenCL kernel argument
        err = clSetKernelArg( kernel, 0, sizeof( cl_mem ), (void *) &idata );
		err_check( err, "Arg0: clSetKernelArg" );

		err = clSetKernelArg( kernel, 1, sizeof( cl_mem ), (void *) &odata );
		err_check( err, "Arg1: clSetKernelArg" );

		err = clSetKernelArg( kernel, 2, sizeof( cl_mem ), (void *) &sdata );
		if (err = CL_INVALID_ARG_SIZE)
			cout<<"Hellooooooooo"<<endl;
		err_check( err, "Arg2: clSetKernelArg" );

        // Step 11: Execute OpenCL kernel in data parallel
        size_t GWsize[] = { MAT_SIZE, 1, 1 };
		size_t LWsize[] = {256 , 1, 1};
        clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, GWsize, LWsize, 0, 0, 0 );
        err_check( err, "clEnqueueNDRangeKernel" );
//-----------------		
		 err = clSetKernelArg( kernel, 0, sizeof( cl_mem ), (void *) &idata );
		err_check( err, "clSetKernelArg" );

		err = clSetKernelArg( kernel, 1, sizeof( cl_mem ), (void *) &odata );
		err_check( err, "clSetKernelArg" );
		LWsize[0] = (MAT_SIZE/256);
		GWsize[0] = 1;

		clEnqueueNDRangeKernel( command_queue, kernel, 1, NULL, GWsize, LWsize, 0, 0, 0 );
        err_check( err, "clEnqueueNDRangeKernel" );

//--------------------
		// Step 12: Read (Transfer result) from the memory buffer
		float mat_b[LWsize[0]];
        err = clEnqueueReadBuffer( command_queue, odata, CL_TRUE, 0, MAT_SIZE * sizeof(float), mat_b, 0, NULL, NULL );

        // Step 13: Free objects
        err = clFlush( command_queue );
        err = clFinish( command_queue );
        err = clReleaseKernel( kernel );
        err = clReleaseProgram( program );
        err = clReleaseMemObject( idata );
        err = clReleaseMemObject( odata );

	  	err = clReleaseCommandQueue( command_queue );
        err = clReleaseContext( context );

        // Display result
		cout<<mat_b[0];
	
        return 0;
}

Kernel part: par_sum_kernel.cl

__kernel void sum( __global float *idata, __global float *odata, __local float *sdata )
{
        int gid = get_global_id(0);
		int lid = get_local_id(0);
		int bid = get_group_id(0);
		
		sdata[lid] = idata[gid];
		barrier(CLK_LOCAL_MEM_FENCE);

		for( int dist = get_local_size(0); dist>0; dist/=2 )
		{
			if(lid < dist){
				sdata[lid] += sdata[lid + dist];
				barrier(CLK_LOCAL_MEM_FENCE);
			}
		}
		if(lid == 0)
		odata[bid] += sdata[0];
 }

david.garcia · November 5, 2010, 2:54pm

The reason you are seeing a CL_INVALID_ARG_SIZE error is because that argument is declared as “__local float *sdata”. When a kernel argument is a __local pointer and you call clSetKernelArg() you need to indicate the size in bytes that you want the driver to allocate for that argument.

In your case, and looking at the kernel you are trying to run, it looks like the amount of memory you need in sdata is equal to sizeof(cl_float)*LWsize[0], so you should do this:

err = clSetKernelArg( kernel, 2, sizeof(cl_float)*LWsize[0], NULL);

Notice also how the last parameter to clSetKernelArg() must be NULL when the kernel argument you are setting is a __local pointer.

kishoreraj05 · November 6, 2010, 6:14am

Thanks…

Now, I got it…

Now It not showing any error…But its returning 0,
can u check kernel, Is it correct function for calculating sum of the value of array which passes in *idata.
I don’t know why its returning 0…

Thank you Dear for replying…

kishoreraj05 · November 6, 2010, 10:15am

Thank You!!! David, Thank You very much…

with your support I am able to get output…

Can you give me some idea about Parallel sorting in OpenCL…

In your knowledge…which is the best sorting Algo for Parallel sorting…

david.garcia · November 6, 2010, 3:05pm

Can you give me some idea about Parallel sorting in OpenCL

I don’t know enough about that to give good advice. I would start reading the papers here: http://gpgpu.org/tag/sorting