Suggestions on profiling headless Vulkan compute?

Hello everyone,

I have a headless Vulkan compute application that just dispatches a single compute shader in its pipeline. I’d like to profile the compute shader and improve its performance. But right now I’m having troubles with tool selection. I’m wondering whether the somebody have good suggestions on this.

To give more details, I’d like to understand the compute shader’s low-level characteristics like the generated hardware ISA’s timing, register/memory usage, barrier overhead, and others. This pretty much means I need to look into vendor specific tools like Nsight and RGP. But IIUC at the moment they are all for graphics and frame oriented; for my application there is no frames and it just completes very quickly so I’m not even possible to capture anything. I’m wondering whether there is a programmatic way to perform captures with these tools?

To my knowlegde, RenderDoc provides an API that I can use to do captures programmatically and it has the nice integration with RGP. But I’m not able to get instruction timing information out of RGP via RenderDoc API. I might be missing something but I think RGP instruction timing still require one to perform captures with the traditional graphics way?

I’ve also checked tools like Tracy. It’s awesome and comprehensive but the information I can get stops at shader level; no insights into the shader itself.

So to summarize, my questions are

  1. Is there a programmatic way to do shader level profiling for Vulkan compute? What kind of tools I should look into?
  2. If not, what tricks can I use to make Nsight/RGP/etc. work better with headless Vulkan compute applications?

Thanks in advance!