Hello guys! I’m back
I will soon start to modify my engine to render images on requests.
To do this i have to retrieve the rendered result from the gpu and i want to know what is the best way in vulkan to do this.
I thought of two ways:
1.) Directly render into an host-visible buffer and then retrieve the result via VkMapMemory etc.
(This might be a bit slow right? Do somebody know how fast device-local-memory is in compare to host-visible-memory, just curious)
2.) Render in a normal device-local memory and submit a copy-command to copy the rendered result into an host-visible buffer and then the same procedure as above: retrieve it with VkMapMemory etc.
Are there other ways or even better ways to achieve this?
Thanks in Advance
[QUOTE=Twanks123;41661]1.) Directly render into an host-visible buffer and then retrieve the result via VkMapMemory etc.
(This might be a bit slow right? Do somebody know how fast device-local-memory is in compare to host-visible-memory, just curious)[/quote]
It may not be possible. Don’t forget: specific uses of images may only be permitted with specific memory types.
vkGetImageMemoryRequirements can tell you whether a particular piece of hardware is even able to render into host-visible memory.
Also, it is entirely possible that host-visible memory is the only type of memory available. It would also be device-local.
This is an option you can do blindly, without querying requirements. Of course, if there is only one memory type, then you’re needlessly doing a copy.
I think #2 is the only viable option, because renderable surfaces are going to be non-linear (VK_IMAGE_TILING_OPTIMAL rather than VK_IMAGE_TILING_LINEAR). If that’s the case then regardless of memory types even on platforms with unified memory, you need to do a copy to deswizzle the data into something that’s going to make sense on the CPU (VK_IMAGE_TILING_OPTIMAL is implementation defined).
I might be wrong and maybe you can render into linear textures on some hardware, but I imagine the performance impact would be significant enough that it would be a poor choice.
That’s a good point too.
Oh and there might be some benefit using dedicated transfer queue, while the GPU already computes something else.