The short version: Is it possible to asynchronously enqueue two kernels such that one output of the first kernel is the global work size of the second kernel?
A longer example: Suppose I have some kernel that does Real Work on a large, sparse dataset. It might be worthwhile to run a preprocess kernel that scanned the dataset for valid/non-empty work items and copied them to a smaller array, and then have my main kernel operate only on the smaller, non-sparse array of known valid data. One of the outputs of the preprocess kernel would naturally be the number of valid data elements, which would then be the global work size of the second kernel. Naively, this means the CPU needs to kick off the preprocessing kernel, wait for it to finish, read the number of valid data elements, and then kick off the Real Work kernel. It would of course be preferable to enqueue both kernels up front (using events to make sure the first kernel finishes before the second starts). Is that possible in OpenCL? It looks like the global work size needs to be specified when the kernel is enqueued; is there any way to tell the CL runtime to defer loading the global work size until just before it’s needed?