Load Distribution and Storage Buffer Performance?

Tekmatek · November 5, 2016, 9:09am

Hey guys!

Sooo, a new day, a new problem. :neutral:

I’ve been porting an OpenCL Kernel to Vulkan. This Kernel does ray-tracing on a huge set of Data (between 1 and 2 GB) and then rendered to the screen. Nothing else.

I have ported this Kernel to Vulkan:
[ul]
[li]Render 2 Triangles as a Quad in Fullscreen[/li][li]Do all the Ray-Tracing in the Fragment Shader (A very large Fragment Shader, >5000 Instructions in the compiled version)[/li]The Shader is pretty much a 1-to-1 port of the working OpenCL Kernel
[li]I use a large Storage Buffer to pass all the Data to the fragment shader[/li][/ul]

Functionally it works in Vulkan. But I am seeing a huge performance difference.
The OpenCL version runs at 20 FPS, whereas the Vulkan version runs at <1 FPS.

I have a few rough Ideas I would like to try, but don’t know how:

[ol]
[li]Can I make the Storage Buffer read-only somehow? I have this feeling that the slow speed might be due to the GPU putting in barriers in case of a write command. But I only need to read anyways[/li][li]Can I manually assign / group how many units work on the Fragment Shader? I know this can be done with Compute Shaders and in OpenCL, so why not here? I tried splitting the rendering area into more triangles, but that seemed to slow things down even further… Might be doing it wrong though[/li][li]I get a VK_ERROR_INVALID_SHADER_NV Error on the vkCreateGraphicsPipeline when my Shader code has too many branches (at least, that is what my testing showed). Any ways arround this? This works fine in OpenCL as well.[/li][/ol]

My last resort would be to use a compute shader instead of the fragment shader, but I would like to avoid this option if not absolutely necessary. (Plus, I don’t even know if the above problems don’t exist there anyways)

Thanks again!

Alfonse_Reinheart · November 5, 2016, 10:04am

Can I make the Storage Buffer read-only somehow?

Yes. In GLSL, you would apply the readonly layout qualifier to the SSBO. In SPIR-V directly, you would use the NonWritable decoration.

I have this feeling that the slow speed might be due to the GPU putting in barriers in case of a write command.

I rather doubt that, since Vulkan doesn’t implicitly put barriers of any kind in a shader. Of course, that depends on whether you’re working directly with SPIR-V or GLSL-generated SPIR-V (the latter will put some in, depending on the dictates of GLSL).

But it should be quick to try it.

Can I manually assign / group how many units work on the Fragment Shader? I know this can be done with Compute Shaders and in OpenCL, so why not here?

Because compute operations are for computing, while fragment shaders are for fragments generated by a primitive rasterizer.

My last resort would be to use a compute shader instead of the fragment shader, but I would like to avoid this option if not absolutely necessary.

Why would you want to use an FS for this at all? This sounds like prime compute shader territory to me. Indeed, circumstances like yours are why compute shaders were invented in the first place: so that you don’t have to use a rendering operation to pretend to compute values.