full screen quads : Instancing or GS ?

_blitz · February 11, 2011, 7:38am

Hi,

If I want to sum cells from different grids on the GPU (enable alpha blending, disable depth test, and draw full screen quads), two solutions come to my mind to send the quad vertices : either use instancing or duplicate the quad vertices in a Geometry Shader… Is one of the techniques better(=faster) than the other ?
In my case, I need 5 quads at most.
Thanks,

B

ZbuffeR · February 11, 2011, 8:06am

5 quads is very small, no use for a complex method, your bottleneck will probably be on per pixel raster operations.

Do you need to change coordinates often, can you reuse same geometry for several quads, etc ?

_blitz · February 11, 2011, 8:19am

You’re absolutely right, just have to use an index buffer with the quad vertices… So simple I hadn’t even thought about it ><

Thanks !

Dark_Photon · February 11, 2011, 6:25pm

You don’t even need an index array. Can just use glDrawArrays( GL_QUADS, …). As ZbuffeR implied, the efficiency of the batch verts is likely irrelevant when you’re eating so much GPU/fill with each quad.

_blitz · February 12, 2011, 10:26am

A quick calculation :

Solution 1 (vbo only)
5 Full screen quad = 2 triangles so :
5(quads) * 2(triangles) * 3(vertices) * sizeof(vertex)
My vertices are defined by a 2d position with floats so sizeof(vertex) = 8bytes
and finally sizeof(vbo) = 240bytes

Solution 2(vbo + ibo)
vbo size = sizeof(quad) = 32 bytes = (4(vtx) * 2(dimension) * 4(sizeof(float))
ibo size = 6(indexes_per_quad_with_gl_triangles) * 5(quad_count) * sizeof(ushort) = 60 bytes
So in total i’d need 32 + 60 = 92bytes

So with solution 2 I have less memory consumption, and cache efficiency.

Now I know that the vertex shader will definitely not be the bottleneck for my batch, but still, I prefer solution 2 ^^

ZbuffeR · February 12, 2011, 10:37am

It would be great if you could post benchmark results comparing the two methods.

Alfonse_Reinheart · February 12, 2011, 10:53am

So with solution 2 I have less memory consumption, and cache efficiency.

The time it took you to type even this sentence into the computer, let alone the rest of your post, is not worth the time you “saved”. You could be rendering with immediate mode, using double-precision attributes, and it still wouldn’t make a bit of difference as far as performance.

You have put far more thought into this subject than is warranted. That’s why the 80/20 rule exists, and that’s why you should always benchmark before you optimize.

_blitz · February 14, 2011, 11:52pm

After benchmarking (ubuntu 10.10 + Radeon 5650 @ catalyst11.11), turns out there’s no difference between VBO, VBO & IBO, or even immediate mode for 5 quads. Not really surprising though, given the few amount of vertices.

ZbuffeR · February 15, 2011, 12:21am

Even better when you learn it by yourself, right ?

system · February 15, 2011, 4:38am

I once spent 4 months trying to figure out what was the fastest way to shutdown my program : destroy the context or make the context non current or doing a process kill or pulling the plug from the computer or cutting the power lines to the entire building. It cost 82 million dollars to find the best solution and now I forgot what it was.

BionicBytes · February 15, 2011, 6:08am

@V-man; you’re too funny man! LOL.

Dark_Photon · February 15, 2011, 6:36pm

Thanks man, I needed that laugh.