Counting # of pixel writes


I want to get the total number of pixels written in a frame. The goal is to detect when the scene gets fill limited.

In order to do that, i thought using the following:

render scene

And then i get the number of pixels.

Do you know if this incurs a performance penalty compared to just rendering the scene without the pixel counting ?

Are there other solutions ?


Very little if any performance penalty is incurred by simply starting and stopping a query. If you try to access the query before the entire scene stops rendering, then you might have a performance problem. But no more of one than you would get by calling SwapBuffers.

It’s unlikely that this query actually tells you what you need to know.

First, it only tells you the number of pixels actually written, not the number of pixels that were attempted to be rasterized, but rejected by Z or stencil.

Second, even the number of written pixels isn’t a good measurement of VRAM traffic (“fill rate”) because the detail level and number of textures accessed play into this measurement.

Third, the complexity of your fragment program (or multi-texture set-up) greatly affects fill rate.

The best way to know whether you are fill rate limited or not is to render the scene at quarter resolution for a while, and see if it renders any faster. If it does, then you’re at least partially fill rate limited.

jwatte wrote:

It returns as its result the number of samples that pass the depth and stencil tests…
…so it should work well for this purpose (but I can’t add own experiences).


It returns as its result the number of samples that pass the depth and stencil tests…

Agreed. However, that is not a very good measurement of fill rate, and initially, the goal was stated as:

The goal is to detect when the scene gets fill limited.


All I’m saying is that detecting how many pixels pass depth/stencil is only a very gross estimate of how fill limited you are.


Yes JWatte is right saying that the number of pixel writes is not enough to detect when an application becomes fill limited. However it is all the same a precious indication.

it would be better to have the total number of pixels (or fragments) treated, i.e. rejected or passed.

May be a new ARB extension ?!

The direct application is to avoid stepping by reducing dynamically the screen resolution (like Dynamic Video Resolution on SGI machines).

I don’t understand why you’d want this. Let’s pretend for a second that the driver gives you detailed statistics for a frame, say, 2M pixels total, 1M pixels passed, average overdraw was 2. Now … what?

How can you tell from these numbers whether or not you’re fill limited? IMO you can’t.

First, a crucial component is missing: time. Sth per frame is worhless information without knowing how much time one frame is.
Second, you’d need to know before hand what your maximum possible fill rate is. Otherwise you have no frame of reference to compare to.

Why not follow the classic approach? If cutting your viewport in half (area-wise) roughly doubles your performance, you’re fill limited. No pixel counting required.

The goal is to have a steady frame rate, 60 Hz for example.
To achieve that, i can move the distance of the far clipping plane or i can reduce the screen resolution.
My application uses many particle effects with lots of billboards. Since particles are moving all the time everywhere, it is difficult to predict for a given viewpoint the required fill rate.

By counting the number of pixels and measuring frame time you can detect that you are approaching dangerous limits and take the decision to reduce the screen resolution instead of moving the far clipping plane.

If you don’t count pixels, then you have to reduce the screen resolution to check if the problem is fill rate. If it is not, then you have introduced a visual artefact which wasn’t necessary.

I would only add that if you seek to find the approximate fill demand, knowledge of the sprite count could be sufficient; that, and an intuition about what you would expect to kill your fill rate.

For example, if I have 5 or 6 characters tight in the view, each with stenciled shadow volumes, and dynamic lights flying, I don’t need a pixel count to know I’m fill limited. The idea is to tag the known cases of heavy fill demand, and try to prevent them from occuring, or at least recognize when they will. If you don’t have this knowledge a priori, then some simple structural additions to your scene graph might help. The sprite/entity count below a certain node in the tree might be helpful. With something like this, you could do statistics on something besides pixels.

Also, sprite areas are easy to calculate. This won’t give you a pixel count, but a count that’s proportional, and that might do.

Anyway, I’ve babbled enough.

It sounds to me as if Golem has a problem something like:

He’s writing something like an effects editor for online compositing (say, for a TV station).

He uses a particle system.

Particles may be spawned by some real-time input (moving cursor around, and whatnot)

Thus, he can’t pre-view effects and run canned, proven-working effects that some artist already tweaked until it was done.

If you don’t have any Z testing, and use specific hardware that you know the statistics of (i e, you specify “always a Radeon X800 XT” or whatever), then using occlusion query is actually probably going to be a pretty decent measurement, for a limited problem such as stated above. You might want to add the total size of all textures bound during the frame, too, for an additional input to the estimate.

Another measurement might be to just add up all visible particles, weighted by one over distance from camera; this is similar to calculating area filled on the CPU, but a little lighter on the CPU cycle consumption.

If you cannot specify the hardware in use, then you have another problem: even if you get a good measurement of pixels rendered, you don’t know how much would be too much for the card in use at runtime. Thus, you’d have to start by doing some kind of profiling, so you know what to shoot for.

One additional caveat: reading back a query result means that you serialize the GPU up to the point of the query. Thus, you may get reduced CPU/GPU parallelism by doing it that way. That may be OK, depending on the application. Summing up all particles, weighted by one over distance to camera, causes no serialization, and if you’re GPU limited, the CPU may be there for “free”, so it’s worth considering.

Serializing the GPU up to the point of the occlusion query is not a problem since this happens at the very end of the rendering, just before swapping buffers. There is nothing more to send to the card until the start of the next video cycle. This blocking call is actually useful to measure frame time.

I could also wait for the next cycle to query the result anyway, so this does not worry me.

FYI, i am in training simulation business; the application is about rendering outdoor landscapes in real time. Stepping is not allowed even when a vehicle drives through a smoke effect for instance, with lots of particles.