Understanding fragment shader invocation count

I’m using pipeline statistics queries to get shader invocation counts for a simple project. The vertex shader counts match my expectations (around 20k). But I’m getting some surprising results for fragment shader invocations. Theoretically, they should be around 550k. Practically, I tried them with 3 GPUs and am getting:

  • NVIDIA GeForce MX250: ~550k
  • Intel Iris Plus: ~650k
  • AMD Vega 64: ~1.3M

Is there some way to figure out why the difference?

I’ve read about “helper fragment shader invocations”, but could there be that many of them to explain more than double the number of total invocations?

I’ve also thought of multisampling, but as far as I can tell, the AMD drivers are set to “Application Settings” and I’m not doing anything in my code to request multisampling. Is it possible that the driver does it anyway? Is there some way to tell?

Context: the scene is relatively simple: a grid of 2D characters and background gradients. I’m measuring the invocations to check the effects of some optimizations. I wouldn’t compare numbers across devices anyway, but the differences made me curious.

Assuming your GL state, data, and shaders are identical across these test cases…

I would use the Pipeline statistics queries to inspect the stats further up in the pipeline for clues. That may suggest why. For instance, assuming this is a simple vertex+fragment shader (no tess or geometry shaders):

  • GL_PRIMITIVES_SUBMITTED
  • GL_CLIPPING_INPUT_PRIMITIVES
  • GL_CLIPPING_OUTPUT_PRIMITIVES

It is suspicious that the discrepancy between AMD and NVIDIA / Intel is ~2X.

Random thought: I wonder if AMD is counting clear fragments and NVIDIA isn’t. Something like that.

Or if there’s a lot more overdraw in the AMD case not prevented by heir/early depth and/or stencil tests?

It’s hard to believe that differences in clipping effectiveness could explain this, but i don’t know anything about the distribution of your rendered objects w.r.t. the bounds of the view frustum/window. To eliminate that as a factor, make sure everything you’re rendering is within the window bounds and within the near/far clip. Also to eliminate the heir/early depth tie-in, disable depth and stencil tests. And for good measure, disable blending.

It’s also hard to explain this with helper invocations, unless maybe you have a very tiny triangles, and the GPU rasterization tile sizes are significantly different.

And multisampling is multisampling: 1 fragment shader execution per pixel (max). To get something different like supersampling (1 fragment shader execution per pixel subsample), you’d need to be using ARB_sample_shading or some vendor-specific feature like NVIDIA’s variable-rate shading.

1 Like

Wow, thank you, this is awesome.

I think you’re on to something with this one:

It’s also hard to explain this with helper invocations, unless maybe you have a very tiny triangles, and the GPU rasterization tile sizes are significantly different.

The scene consists of a lot of small, non-overlapping triangles. Something like 6000 small triangles in 2d. The overall screen-size of it is… around 900x600 pixels, so the average triangle sounds like 90 pixels, but in reality, there are many tiny ones and maybe 1000 bigger ones. (Read below for why I’m doing this.)

I’ll read more about rasterization tiles, but I assume this means that all pixels in a tile go through the fragment shader, even if they don’t all end up in the triangle. So either those tiles are very different or such wasted fragment shaders are counted differently.

The rest of the suggestions likely don’t apply. This is very simple: no tessellation, no geometry, no depth/stencil, no overdraw, same depth, trivial projection, no culling or anything, no blending, no supersampling. So I bet this is it. I’ll play with the triangles sizes a bit later and see what happens.

Thank you so much for your reply!

P.S. Why so many tiny triangles: it was a beginner’s attempt at avoiding overdraw, actually. I draw variable-sized characters on fixed-sized cells, but the characters are in a packed texture atlas, so I can’t just texture sample the entire cell. So I draw the letter in the middle of the cell, but then I have to fill in the edges (with no texture sampling), so I use up to 8 tiny triangles for padding. Well, now I learned about depth testing, so I plan to use that instead, so I’ll just have 2 bigger triangles for the background and 2 for the letter. ¯\_(ツ)_/¯

Sure thing! Glad it gave you some ideas. BTW, sorry for the GL-centric response. At the time I didn’t notice this was in a Vulkan forum. Hopefully the mapping of this to Vulkan is fairly straightforward.

Also make sure your begin query / end query are “tightly” around the draw command you want to query statistics for.
I had an issue where they were not and the statistics would include MSAA resolve and what not.

Replying to an old thread here… But for some reason only got mail notification today…

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.