It is then streamed over network, and eventually displayed to the user. Like google maps, but different
You might say that network latency dominates the latency which may arise from suboptimal Vulkan practices, but still, even for “pedagogical” reasons I would like to understand this.
Ah, so it is practically not offscreen in the end and needs real-time user input.
You might say that network latency dominates the latency
I wouldn’t dare. I have < 10 ms to some sites on internet. While 60 Hz is ~17 ms. So rendering and internet latency can be quite comparable nowadays. In fact, if all of this work is to be done on the server for multiple users, I would even be afraid the rendering HW shared among several users may be the bottleneck.
What would be a recommendation in terms of memory types, queues, synchronization etc.?
Device-local memory (if applicable).
AMD has this small amout of weird device-local host-coherent memory. Probably worth investigation for some uses.
Rendering=GRAPHICS queue. It may be benefical to use separate COMPUTE queue if it has some clearly defined compute subproblem. Separate TRANSFER queues should be beneficial to appropriate operations.
Well, you always have to synchronize. The trick is to have the GPU (and well, even the CPU) always fed. I.e. having alternative work, when it waits on some Semaphore or Fence.
I’ll try to explain a bit more how I (barely) understand this should work – it would be grate if you could provide some more input:
I’ve heard there are those copy engines for async data transfer to/from gpu. I did not understand if everything goes through them or not. Anyway, I thought it would be best to upload textures (for map data) async via dedicated transfer queue.
I would upload camera mvp via staging buffer to device local buffer using the gfx queue.
They say push constants are faster then uniforms, so maybe I should use them for just the mvp?
There is no swapchain, so I would create a device local image for the framebuffer color attachment, then download its contents via host visible non-coherent memory.
Without the swapchain, there is no need to acquire images from it. However, I somehow need to wait till the framebuffer readback is complete before overwriting it. Not sure what else?
“somehow”? Vulkan really only has one way for the CPU to wait on something to be finished. Well yeah, you can WaitIdle on the queue or the device directly, but those are obviously the wrong answers. Which leaves the only other tool Vulkan has: fences.
If we are talking one pipe and same workload, then latency and bandwith is really the same thing. I.e. the faster result is being computed, the faster you get the result.
For single buffered approach the worst case would be when the input is received after the rendering already started, so the latency would be:
frame rendering of previous frame + readback/waiting on frame buffer to get available + rendering of this frame + readback
For double buffered the worst case would be:
frame rendering of previous frame + rendering of this frame + readback
There’s a chance that the latency would be hidden for single buffered (i.e. you don’t need the frame buffer ready until the writing phase). But the double-buffered approach is more predictable about it.