> 1) When the data Object was really transfered between main momory and opencl devices
During clEnqueueRead/Write or clEnqueueMap/Unmap operations.
> 2)How to deal with the GPU used by the OS? If it is overloaded this guy likes to stop you!
> Is there any means to find which GPU is used by the OS, I don’t want to touch this camel.
Use OpenGL “get device” commands to find the GPU being used by the OS. It’s not perfect, but often works.
> 3)If I have 2 layer loop, and the first layer is big enough for parallel. I want to know which one is better:
> a)make a 1-dimension clEnqueueNDRangeKernel, and in the kernal make the other layer with a for statement.
> b)just make a 2-dimension clEnqueueNDRangeKernel.
> I was told the for, if, while statements will badly slow the kernal, but in my case, it let me to avoid the syncronize problem for sum.
Implement and try both and measure results.