Aes in OpenCL

Hello everyone.
I am working on implementation of AES algorithm (ECB) in OpenCL. This is for my college diploma. So far I managed to implement AES in serial execution. And it is working very well. Now I need to transform that to work in parallel mode. I managed to build project and run it but it not calculating right.
I don’t know how to transform it work as fast as it can, to use all available thread on GPU.

Here is my source code.

I have three file, aes.cpp the main file for project, is kernel file for parallel execution and const.h is header file for constants I used. is file I need to change for encryption to work. Can someone explain me how to implement operations add round key, subbytes, mixcolumns in parallel mode?
I am reading how to multiple matrix in OpenCL, hope that will help me somehow.

Is it really possible that no one knows nothing? Can someone just explain me basic of parallel execution? How to find out how many threads I have on my GPU, how to arrange those thread for best performance, how to use those threads etc.