Currently I’m using CPU-GPU computation using OpenCL to lowering down CPU overhead by giving some computations to GPU.
There are several GPU blocks in CPU routines and data transfer is necessary. For faster transfer, I use CL_MEM_ALLOC_HOST_PTR and map to host-device with pointer.
But then, when data transfer occurs, CPU load is getting really high (I checked this by using ‘top’ command in linux). Actually, the increased overhead is almost same as using memcpy() to copy the same amount of data in CPU.
Is there any way to minimize host(CPU) usage in data transfer? Or, is this an inevitable cost in this environment?