How to efficiently manage memory for both iGPU and GPU?

Hi. I’m writing code that must be able to work on both iGPU and GPU. To make it clear, I’m not trying to use them simultaneously.
For iGPU I create a single buffer in host memory, do some computation on host (reading and writing to that buffer using mapped pointer) and then just pass that buffer into kernel, which is queued several times. I believe it should be the most effecient method, since buffer is zero copy.
For GPU I afaik that approach won’t really work, because data will have to be sent on every kernel launch. So I have to create additional buffer which resides in device memory and copy host buffer to device one before launching kernels.
On the other hand GPU method isn’t optimal for iGPU systems, because device shares the same memory with CPU and buffer doesn’t have to be copied.
Is my reasoning correct? Is there some method of managing buffers that will work optimally for both iGPU and GPU? Maybe there are ways to programmatically identify if CPU and GPU share the same memory?

I tested the code that I wrote for iGPU on Nvidia GPU and performance doesn’t look awful as I expect it to be if buffers were sent from RAM to GPU on every kernel enqueue. I still not sure if it is the optimal approach, because Nvidia guide suggests to create separate buffer for RAM and GPU memory and manually copy buffers before queuing kernels. Profiling would help a lot, but Nvidia seaming deprecated their OpenCL tools.
AMD guides explicitly state that buffers residing in RAM won’t be cached into GPU memory and are going to be sent every time kernel is executed [AMD_OpenCL_Programming_Optimization_Guide2.pdf page 35 option 5]. Which is super sad.