I’m preparing to port an OpenGL ES 3.1 application to Vulkan and there is a particular scenario where I’d like to determine the most performant action first:
I issue multiple drawcalls from the same VBO, sometimes the same range, sometimes different. There has to be an absolute ordering between pixels of different drawcalls that means a pixel of a certain drawcall may never appear in front of a pixel of another certain drawcall. I cannot use depth buffer for it because it does other things. For the ordering, I use glStencilFunc with a certain reference value and GL_GREATER/GL_LESS. Multiple VBOs are rendered per frame, data from different VBOs may use the same reference value. Problem is that I issue a lot of drawcalls and each each call takes a glStencilFunc update which causes lots of CPU overhead. For the original OpenGL app an alternative has been devised that fetches pixels from the framebuffer using GL_EXT_shader_framebuffer_fetch then discards pixels from the GLSL shader. Since I can avoid glStencilFunc calls now and there are no other state changes this allows me to use 1 drawcall per VBO.
Now for the planned switch to Vulcan I learned two things: First, that draw commands are supposedly CPU cheaper than in OpenGL. Second, that it is possible to define the dynamic stencil ref state of a pipeline using VkDynamicState, VK_DYNAMIC_STATE_STENCIL_REFERENCE and vkCmdSetStencilReference.
So my question is, what do you think would be more performant for the upcoming Vulkan implementation:
Port the blending/GL_EXT_shader_framebuffer_fetch based solution to Vulkan with one drawcall per VBO.
Port the original solution with multiple drawcalls per VBO using VkDynamicState and replacing each glStencilFunc call with a vkCmdSetStencilReference call.
I hesitate using the GL_EXT_shader_framebuffer_fetch solution unless it is absolutely necessary because it is not available on all platforms, discard might be GPU expensive and I have to take additional care how the framebuffer is used so I’d be quite happy if Vulkan/VkDynamicState made it superfluous.