I am trying to develop a algorithm for image processing on GPU.

I can able to implement convolution on GPU till now.

Now in that algorithms I have to use the convolution many times.

Eg:

_kernel xyz(some parameters) // the function xyz must also be executed on GPU

{

convolution(some parameters); // This function is called many times

}

float* convolution(some parameters)

{

```
const int nWidth = get_global_size(0);
const int xOut = get_global_id(0);
const int yOut = get_global_id(1);
const int xInTopLeft = xOut;
const int yInTopLeft = yOut;
float temp1 = 0;
for (int r = 0; r < nFilterWidth; r++)
{
const int idxFtmp = r * nFilterWidth;
const int yIn = yInTopLeft + r;
const int idxIntmp = yIn * nInWidth + xInTopLeft;
for (int c = 0; c < nFilterWidth; c++)
{
const int idxF = idxFtmp + c;
const int idxIn = idxIntmp + c;
temp1 += tempB[idxF] * pInput[idxIn];
}
} //for (int r = 0...
const int idxOut = yOut * nWidth + xOut;
img[idxOut] = temp1;
```

}

Is there any method to do such type of things? Or is there any specific method like that?

Please help me in this regard.

Thanks in advance.

Regards