Hello all,

The greatest difficulties I encounter are those with kernel writing. I am learning OpenCL for my job and as an exercise I am attempting to parallelize the explicit method for a 1D heat equation. The serial C implementation is actually very simple:

```
int i,j; // Discrete Mesh Indices
double s; // Numerical stability factor
double x; // Spatial dimension variable x
double StepSize_x; // Step size for x;
int NumMeshPoints_x; // Number of mesh points for x
double t; // Time dimension variable t
double StepSize_t; // Step size for t
int NumMeshPoints_t; // Max steps for t
// INITIALIZATION
NumMeshPoints_x = 9;
StepSize_t = 0.0025;
NumMeshPoints_t = 10;
StepSize_x = 1.0/(NumMeshPoints_x+1);
s = StepSize_t/pow(StepSize_x, 2 ); // Satisfy s < 0.25 for stability
double w[NumMeshPoints_x+1]; // Level Down Array
double v[NumMeshPoints_x+1]; // Level Array
// POPULATE THE FIRST LAYER GIVEN THE BOUNDARY CONDITION
for( i = 0; i < NumMeshPoints_x+1; i++ )
{
x = i*StepSize_x;
w[i] = 100;
}
// COMPUTE TEMPERATURES USING MESH GRID
t = 0;
for( j = 1; j <= NumMeshPoints_t; j++ )
{
// BOUNDARY CONDITIONS AT END POINTS
v[0] = 10;
v[NumMeshPoints_x+1] = 10;
for( i = 1; i < NumMeshPoints_x; i++ )
{
// POPULATE THE CURRENT LAYER
v[i] = s*w[i-1] + (1-2*s)*w[i] + s*w[i+1];
}
t = j*StepSize_t;
for( i = 0; i < NumMeshPoints_x+1; i++ )
{
// MAKE CURRENT LAYER SEED FOR THE NEXT COMPUTATION
w[i] = v[i];
}
}
```

So I am trying to make the meat of this program, the double for loop, into my kernel. So as it stands, each layer of the array v[] is calculated at a given time step t, so that stepping in the t direction expands the grid, layer by layer. However, to calculate a layer requires the previous layer calculated. So I had two thoughts about this and I wasn’t sure how to really go about this:

- Write a kernel that handles the two for loops, prototyped as such:

```
__kernel void heat( __global double *prevLayer,
__global double *currLayer,
int NumMeshPoints_x,
int NumMeshPoints_y );
```

but I just don’t understand what would happen here. So I understand that I can access the arrays through the get_global_id(0) OpenCL command but can have each thread compute one point in the layer, then wrap that in the for loop to time step it? Otherwise my second though was to

- Have the kernel only handle a single layer, then enqueue it in a for loop for the time steps.

I am having difficulties even realizing a problem space. In MPI there is the probelm that threads would have to communicate continuously to compute the end points of their respective segments of the layer. I don’t know if this is a problem in OpenCL. OpenCL programmers, I need your help! Please any advice as I am lost!