should i turn off AA with fullscreen quads

Korval · August 17, 2007, 2:20am

If you draw a triangle to cover the entire screen, I think the hw will clip it and you will end up with 2 triangles still.
I would guess that it doesn’t actually clip triangles to the scissor box. It merely culls fragments outside the box.

My main question is this: if I use a quad, I am guaranteed pixel-perfect alignment. How can I get that guarantee when I’m going to get floating-point rounding error on interpolation?

zed · August 17, 2007, 3:05am

Wouldn’t the triangle have to be very large, compared to the size of the screen?
does it for matter?, surely clipping a small part of the tri aint much better than clipping a lot of it. (my mentioning of adding one offset was more for safety with rounding)

I’m not sure if there’s a “best” method. As long as it covers the entire [-1, 1] range in clip space it’s fine. Two layout’s I’ve been using are:
(-1, -1), (3, -1), (-1, 3)
and
(0, 2), (3, -1), (-3, -1)
related to the above, method A clips less than method B (though ultimately does it matter?), method A seems more natural as well since u align a tris edge with the screens edge

sqrt[-1] mentioned something though about divided the screen up into smaller quads ( + i think ive heard something similar as well ) personally it seems not logical (but what would i know, ok the card processes everything in chunks, so perhaps theres something)

in my game i have (at least)

draw fullscreen quad depth pass
draw fullscreen quad horizontal bloom
draw fullscreen quad vertical bloom
draw fullscreen quad horizontal bloom
draw fullscreen quad vertical bloom

draw fullscreen quad horizontal DOF
draw fullscreen quad vertical DOF
draw fullscreen quad horizontal DOF
draw fullscreen quad vertical DOF

plus i think particle buffer + another depth pass

so thats quite a few fullscreen(*) quads im drawing, so improving this even by a single percent is worthwhile, since its a hell of a lot of pixels (more at lesser resolutions, then again the trend is for higher res’s so importance is less though the counterpunch is postprocessing something new )

(*)note fullscreen is actually 1:1 or 1:4 or 1:16 sized depending on the rendering buffers mapping

btw wizard 3 posts in 6 years great,
still trying to decifier that last one though.

(rant mode)
deleted
(/rant)

(edit) actually this would be a great topic for a pdf from nvidia or amd ‘how to draw a fullscreen quad’ which today is more pertinent then ever.

what’s funny is, theres no consensus here + its prolly the simpliest thing that a person can do in graphics

wizard · August 17, 2007, 3:35am

zed: Ain’t it great. I’ve been working not posting But I promise I’ll be writing more in the future, lol.

Korval: I’m sure clipping is done in any case. Rasterizing areas outside the viewport and then discarding them would be a waste of time.

Humus · August 17, 2007, 10:20am

Originally posted by sqrt[-1]:
[b] I find this interesting as in some console hardware docs, they recommend doing the fullscreen passes in a grid of quads (6x8 tiles? - not sure)

Something about not flooding the fragment pipe or something… (and most consoles use PC-like hardware) [/b]
I’m not a console guy, but I believe those tiles are screenspace points, so they are rasterized as squares instead of as two triangles. You could try implementing something similar on PC with pointsprites, but that would add some math to the shader for texture coordinate computation, so I’m not sure if that would be a gain.

_Fishman · August 17, 2007, 10:35am

Predicated Tiling .

Humus · August 17, 2007, 10:38am

Originally posted by V-man:
If you draw a triangle to cover the entire screen, I think the hw will clip it and you will end up with 2 triangles still.
Not unless it goes outside the guardband. It’s a bit old, but there’s a fairly good overview of how it works here:
http://developer.nvidia.com/object/Guard_Band_Clipping.html

Korval · August 17, 2007, 10:41am

I’m sure clipping is done in any case. Rasterizing areas outside the viewport and then discarding them would be a waste of time.
If clipping were happening, then there would not simply be one diagonal line as in the quad case; there would be many. Which would make this a totally meaningless idea from a performance standpoint.

Normally, clipping only happens if it is absolutely necessary. That is, if the polygon would break the plain of the camera.

Humus · August 17, 2007, 10:47am

Originally posted by Korval:
My main question is this: if I use a quad, I am guaranteed pixel-perfect alignment. How can I get that guarantee when I’m going to get floating-point rounding error on interpolation?
I really don’t think this would ever matter for anything. Not sure if you’d be “pixel-perfect aligned” with quads even. The triangle would be twice as large as the quad, so I assume at worst you lose one bit of precision.

Lindley · August 17, 2007, 12:32pm

Well, I tried using a scissored triangle in place of a quad. It did show a very slight speedup. Thanks for the tip!

ZbuffeR · August 17, 2007, 3:06pm

This “Predicated Tiling” reminds me of … Tiled rendering on the PowerVR-based cards … been a long time.

dorbie · August 19, 2007, 12:30am

ZbuffeR, predicated tiling is a very old idea, you could go back to pixel planes and see it implemented.

Various contemporary architectures have similar styles of framebuffer management, but it has long been understood that it is not free.

http://www.cs.unc.edu/~pxfl/

ZbuffeR · August 19, 2007, 1:22pm

Thanks Dorbie, for the background info.

tarantula · August 20, 2007, 2:32am

The full screen Triangle vs Quad performance seems to be a bit better known in the GPGPU community. GPUBench has a test dedicated to this. You can see that using a full screen triangle is slightly faster. Check the third graph on this page for results on 7800GTX: http://graphics.stanford.edu/projects/gpubench/results/7800GTX-7772/

sqrt_1 · August 20, 2007, 8:29pm

I looked in console docs about the grid for fullscreen passes and it states that it can be better due to the GPU’s rasterization rules and minimizing texture cache misses. (8x1 grid seemed to be good for 1280x720)

Perhaps if someone is really keen they could write a test that cycles through a lot of different grid pattens for a given resolution to find the optimal one for different cards?

zed · August 21, 2007, 12:32pm

ill do some tests tonight
so thats 8 quads of (1280/8) x 720
ill try diving it up vertically as well
also perhaps 4 triangles centered on the screencenter is the way to go

zed · August 25, 2007, 4:33pm

there is something to spliting the screen up into smaller areas

using GPUBench
fpfilltest -r triangle -c1 -k 256 -n == ~1120m/pix
fpfilltest -r triangle -c1 -n == ~1090m/pix