[newbie] OpenCL code within C++ program


I have a present code written in C++. It uses only platform-independent C++98 features and boost libraries; of the latter only the threads library is used as non-header only.

The code has about the following structure:

input / data setup
operation A
operation B
operation C
operation …

where each operation has three components:

initialization [single threaded]
system update [multi-threaded]
finalization [single threaded]

Each system update consists of many (thousands) of individual operations performed basically in SIMD-manner (though with if-conditions) and comprise by far the largest run-time weight. If there are n operations to be performed, then this section is presently CPU multi-threaded (using the boost threads library) by assigning each of m CPU cores a workload of n / m operations. The maths operations are +, -, *, /, log, exp, sqrt plus actually some boost functions, but I suppose I could live well with restricting that code section to mere C99 functionality.

I would like to speed up the system updates using OpenCL instead of boost. What I ask here is whether it is possible to integrate OpenCL code das part of an overall C++ program (i.e. I have no ambition of rewriting the whole program in OpenCL-restricted feature sets !), and if so how. Presently I do not know how to program in OpenCL (and I am hence not familiar with its technicalities) and thus I’d like to know if it would be worth the effort at all to take the learning curve.
For example, would it be possible to place all the OpenCL stuff into C (or C++) wrapper functions and then invoke those from C++? Or could OpenCL code functions ideally also become right integrated into C++ code, and say the compiler mechanism then identifies these functions as OpenCL and deferring their compilation to the OpenCL SDK?

The number of operations to be be performed is very large (literally billions). Hence at each operation the invokation of the OpenCL code section should not imply a heavy invokation cost (e.g. run-time compilation or through loading an external library on each separate invokation etc).