Question regarding operator = and operator +=

I’m working on some code and I am getting DRASTIC performance changes between the two lines of code:
c0o.xyzw += (float4)(fa,fb,fc,fd);
c1o.xyzw += (float4)(fa2,fb2,fc2,fd2);
c2o.xyzw += (float4)(fa3,fb3,fc3,fd3);
c3o.xyzw += (float4)(fa4,fb4,fc4,fd4);


c0o.xyzw = (float4)(fa,fb,fc,fd);
c1o.xyzw = (float4)(fa2,fb2,fc2,fd2);
c2o.xyzw = (float4)(fa3,fb3,fc3,fd3);
c3o.xyzw = (float4)(fa4,fb4,fc4,fd4);

The first one runs lightning fast (0.01 sec). The second one slows my kernel down to 18 seconds.

Note that c0o …c30o are uninitialized float4’s … is it just discarding the memory write because it is writing to uninitialized memory? Does opencl initialize the stack variables at all?

Which OpenCL implementation are you using? What’s your hardware?

Does opencl initialize the stack variables at all?

In OpenCL C, like in C99, the contents of private (i.e. stack) uninitialized variables is undefined.

I’m running on AMD radion 6490HD on a Macbook Pro Snow Leopard 10.6.6

I’m not quite sure what is going on as I have tried to compile the program with no optimizations and it still has this performance differential.

May I ask why are you reading from uninitialized variables on purpose?