I have an OpenCL kernel where all the threads need to access the same 3x4 matrix. So, I am guessing constant memory would be the best place for such a matrix.

However, I am unable to find an example where this constant memory is used. In CUDA, one has this cudaMemcpyToSymbol where one can bind data from CPU to this constant memory.

How do I achieve something similar in openCL? Sorry for the n00b question.

Thank you for the reply. The problem is that the matrix is only known at run time and not compile time. So, I read an image and an associated transformation matrix and then I transform every pixel location in the image with this transformation matrix. Hence, every thread needs to access the same matrix but it is known only when I get the input from the user.