Is it possible to write a function in a kernel file

I am trying to develop a algorithm for image processing on GPU.

I can able to implement convolution on GPU till now.

Now in that algorithms I have to use the convolution many times.


_kernel xyz(some parameters) // the function xyz must also be executed on GPU
convolution(some parameters); // This function is called many times

float* convolution(some parameters)

	const int nWidth = get_global_size(0);

	const int xOut = get_global_id(0);
	const int yOut = get_global_id(1);

	const int xInTopLeft = xOut;
	const int yInTopLeft = yOut;

	float temp1 = 0;
	for (int r = 0; r < nFilterWidth; r++)
		const int idxFtmp = r * nFilterWidth;

		const int yIn = yInTopLeft + r;
		const int idxIntmp = yIn * nInWidth + xInTopLeft;

		for (int c = 0; c < nFilterWidth; c++)
			const int idxF  = idxFtmp  + c;
			const int idxIn = idxIntmp + c;
			temp1 += tempB[idxF] * pInput[idxIn];
	} //for (int r = 0...
	const int idxOut = yOut * nWidth + xOut;

	img[idxOut] = temp1;


Is there any method to do such type of things? Or is there any specific method like that?

Please help me in this regard.

Thanks in advance.


If your question is “can a kernel call into another function?” then the answer is yes.

I’m also a little unclear on what exactly is being asked but I’ll try to answer.

You can define auxiliary functions within .cl files. (.cl being the typical file type used to store kernels).

You could easily do something such as:

/* */ 

__kernel void vector_add_gpu{
    __global const float4 *src_a; 
    __global const float4 *src_b; 
    __global float4 *result,
    const int num){ 
    const int idx = get_globa_id(0); 

    if(idx < num){
        result[idx] = VectorAdd(src_a[idx], src_b[idx]); 
/* */ 

//note how this is not a kernel. It's a simple function that will return a float4. 
float4 VectorAdd(read_only float4 a, read_only float4 b){
	return a + b; 

Dear all thanks for your reply…

Sorry as I cant able to express my doubt clearly.

I will make my question a bit clear. As Mr. silicone_milk said, I want to use the scenario like that.

But here I want to make the vector addition function in a kernel, and vector_add_GPU into a function or kernel.

My aim is that every function has to run on GPU.