What's the best way to retrieve the rendered result?

Twanks123 · January 6, 2017, 4:33am

Hello guys! I’m back

I will soon start to modify my engine to render images on requests.
To do this i have to retrieve the rendered result from the gpu and i want to know what is the best way in vulkan to do this.
I thought of two ways:
1.) Directly render into an host-visible buffer and then retrieve the result via VkMapMemory etc.
(This might be a bit slow right? Do somebody know how fast device-local-memory is in compare to host-visible-memory, just curious)

2.) Render in a normal device-local memory and submit a copy-command to copy the rendered result into an host-visible buffer and then the same procedure as above: retrieve it with VkMapMemory etc.

Are there other ways or even better ways to achieve this?

Thanks in Advance
SH

krOoze · January 6, 2017, 5:50am

Platform dependent speed.
I would assume as slow as the copy. Except that you have no explicit control of when the copy happens or how it is partitioned. So I would assume on typical CPU+DGPU as slow or worse than 2)
Seems to be the robust and conventional way to do it.
Some platforms have device-local host-visible memory (AMD). Based on previous discussions I gained the impression it is intended for small low-latency transfers of constants or something. Might be worth measuring for this case.
Some platforms (integrated GPU) also have device-local host-visible memory, which in this case is obviously the same (unified) memory. Should take advantage of this and skip the copy.

Alfonse_Reinheart · January 6, 2017, 6:33am

[QUOTE=Twanks123;41661]1.) Directly render into an host-visible buffer and then retrieve the result via VkMapMemory etc.
(This might be a bit slow right? Do somebody know how fast device-local-memory is in compare to host-visible-memory, just curious)[/quote]

It may not be possible. Don’t forget: specific uses of images may only be permitted with specific memory types. vkGetImageMemoryRequirements can tell you whether a particular piece of hardware is even able to render into host-visible memory.

Also, it is entirely possible that host-visible memory is the only type of memory available. It would also be device-local.

This is an option you can do blindly, without querying requirements. Of course, if there is only one memory type, then you’re needlessly doing a copy.

Columbo · January 9, 2017, 8:03am

I think #2 is the only viable option, because renderable surfaces are going to be non-linear (VK_IMAGE_TILING_OPTIMAL rather than VK_IMAGE_TILING_LINEAR). If that’s the case then regardless of memory types even on platforms with unified memory, you need to do a copy to deswizzle the data into something that’s going to make sense on the CPU (VK_IMAGE_TILING_OPTIMAL is implementation defined).

I might be wrong and maybe you can render into linear textures on some hardware, but I imagine the performance impact would be significant enough that it would be a poor choice.

krOoze · January 9, 2017, 9:14am

That’s a good point too.

Oh and there might be some benefit using dedicated transfer queue, while the GPU already computes something else.