# Precision problem

Hi there, I have the following code that gives me a hard time.

    local_density = 0.0;
for(kk = 0; kk < 9; kk++)
{
local_density += tmp_cells[pos].speeds[kk];
}

u_x = (tmp_cells[pos].speeds + tmp_cells[pos].speeds +
tmp_cells[pos].speeds - ( tmp_cells[pos].speeds +
tmp_cells[pos].speeds + tmp_cells[pos].speeds))
/ local_density;
u_y = (tmp_cells[pos].speeds + tmp_cells[pos].speeds +
tmp_cells[pos].speeds - ( tmp_cells[pos].speeds +
tmp_cells[pos].speeds + tmp_cells[pos].speeds))
/ local_density;
u_sq = u_x * u_x + u_y * u_y;
u =   u_x      ;
u =         u_y;
u = - u_x      ;
u =       - u_y;
u =   u_x + u_y;
u = - u_x + u_y;
u = - u_x - u_y;
u =   u_x - u_y;
t1 = 2.0 * c_sq;
d_equ = w0 * local_density * (1.0 - u_sq / t1);
t3 = w1 * local_density;
t2 = t1 * c_sq;
t1 = u_sq / t1;
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
t3 = w2 * local_density;
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);
d_equ = t3 * (1.0 + u / c_sq + (u * u) / t2 - t1);

for(kk = 0; kk < 9; kk++)
{
cells[pos].speeds[kk] = (tmp_cells[pos].speeds[kk] + params->omega *
(d_equ[kk] - tmp_cells[pos].speeds[kk]));
}


My problem is that when the specific code runs using OpenCL it gives me faulty/different values compared to the serial execution and after a few execution of the three specific kernels (this one is one of them and each of them just updates the cells and tmp_cells values using some calculations) it drives to segmentation fault. When I test the other two kerrnels the results are correct (same to the serial execution) so I guess only the third kernel (this one) and more specificilly this piece of code gives me the problem.

I have to say that the code that runs in the serial execution is exactly the same. No difference at all. What am I doing wrong here? Am I missing something?

as for the definition of the struct it is the following

typedef struct {
double speeds[NSPEEDS];
} t_speed


One last thing that I observed is that when I only change the value of only one the the speeds in the last loop more results are as expected than when I change all of them. I really can’t understand why this happens…

I also want to add a couple more things that I observed. After executing the above code (and also the two other kernels) inside the loop it always gives a segmentation fault in a specific point (even with trying to compile the OpenCL code without optimizations).

Also if I comment out the last loop

for(kk = 0; kk < 9; kk++)
{
cells[pos].speeds[kk] = (tmp_cells[pos].speeds[kk] + params->omega *
(d_equ[kk] - tmp_cells[pos].speeds[kk]));
}


everything works fine (meaning no segmentation fault).

More news.

When I target the CPU for the execution everything works great. The results are as expected and I get no segmentation faults. So what is the deference between running the OpenCL code in the CPU and in the GPU as a matter of code?