How to use OpenCL to optimize my decode algorithm?

I have one decoder algorithm have been implemented in the VS, and now I want to optimize the alogorithm performance on the AMD APU. My quesion is:
a. Do I need to use the Opencl heterogeneous programming to do my algorithm optimization?
b. If I use the OpenCL, how to start my work?
c. What aspects(methods) can I optimize to my algorithm ?