Bad performance after transferring from 32 bit to 64 bit

I’m working on both my computers, one of which has a 32-bit Win 7 OS and the other one has a 64-bit Win 7 OS. Both machines have the Intel HD Graphics 4000 GPU inside.
My kernel works fine with the 32-bit one however, when I transferred it to the 64-bit machine, it is functionally correct but the performance is bad:
On the 32-bit machine, it takes 27 sec to perform the whole process while on the 64-bit one, it takes 111 sec to do the same thing.
I didn’t change the code and that’s why I found it strange. Also, I understand there are some difference between a 32-bit OS and a 64-bit one (like the size of a pointer, etc.) But since the final output is correct, I suppose that it doesn’t effect my program here?
I wonder is it happening to anyone else? Or did I miss something in the transferring?
Thanks in advance.

I have just updated my gpu driver to the latest version and the kernel failed to work!
I got a “clSetKernelArg function for memiRect failed.” error and the error code is -51 which indicated a mismatch of arg_size.
The size of cl_mem is 8 bytes in 64-bit OS and 4 bytes in 32-bit OS. Is this the problem? What should I do to pass the arg in this situation?

Can you show us your clSetKernelArg call? Does it look like this:

cl_mem buf = clCreateBuffer…
clSetKernelArg(kernel, 0, sizeof(cl_mem), &buf)

Or something different?

Hi, kunze
Sorry that I forgot to update this thread. I finally solved this problem by updating the GPU’s driver on the 64-bit machine. I guess that’s why the performance was so bad. Still, Thank you very much for your reply.