I am porting some CPU code to OpenCL and I will have some 20 odd different kernels. My question is that should I compile all 20 kernels at the beginning of the program and maintain an array of cl_kernel objects and call them as necessary or is there a better way to do this. Also, it is enough to call clBuildProgram for each kernel just once during the lifetime of the program.
Is it possible to build the kernels at compile time?
You can’t build kernels at compile time unless you’re only going to run it against the same vendor’s hardware (same as the vendor sdk/driver you’re talking to), and even then probably only with a limited range of hardware/limited range of driver revisions. Even if you could, it limits you to pre-written code, and run-time code-generation is a practical and worthwhile solution to some problems.
But caching of the binaries works pretty well and amounts to much the same thing. Particularly with AMD’s implementation who’s binaries are the device-specific code so the load isn’t much more than a manual dlopen. AFAICT nvidia already caches the binaries, and their binaries are a higher-level intermediate assembly language which takes longer to load even when cached (still faster then a full build tho). (and i recall seeing someone mention macos just returns the filename to the .o file for the ‘binary’). You could possible pre-compile and cache at install time, but then you’d still need a fall-back at realisation time incase the driver rejects it/the user changes configuration so there doesn’t seem much point doubling up on the code for no real benefit.
In my limited experience adding all the compiles/loads at the start gives a user a more natural experience; i.e. they’re used to long start-up times ever since computers existed. If you try load-on-demand it tends to throw up potentially long pauses in unexpected places. Assuming you’re not otherwise memory constrained of course, or that this delay will be unacceptably long.
And finally yes, you can just run clbuildprogram and lookup the kernel handles and keep them around for the life of the programme. Drivers may swap the code in/out of the device as necessary as well, but this will be transparent.