full screen quads : Instancing or GS ?


If I want to sum cells from different grids on the GPU (enable alpha blending, disable depth test, and draw full screen quads), two solutions come to my mind to send the quad vertices : either use instancing or duplicate the quad vertices in a Geometry Shader… Is one of the techniques better(=faster) than the other ?
In my case, I need 5 quads at most.


5 quads is very small, no use for a complex method, your bottleneck will probably be on per pixel raster operations.

Do you need to change coordinates often, can you reuse same geometry for several quads, etc ?

You’re absolutely right, just have to use an index buffer with the quad vertices… So simple I hadn’t even thought about it ><

Thanks !

You don’t even need an index array. Can just use glDrawArrays( GL_QUADS, …). As ZbuffeR implied, the efficiency of the batch verts is likely irrelevant when you’re eating so much GPU/fill with each quad.

A quick calculation :

Solution 1 (vbo only)
5 Full screen quad = 2 triangles so :
5(quads) * 2(triangles) * 3(vertices) * sizeof(vertex)
My vertices are defined by a 2d position with floats so sizeof(vertex) = 8bytes
and finally sizeof(vbo) = 240bytes

Solution 2(vbo + ibo)
vbo size = sizeof(quad) = 32 bytes = (4(vtx) * 2(dimension) * 4(sizeof(float))
ibo size = 6(indexes_per_quad_with_gl_triangles) * 5(quad_count) * sizeof(ushort) = 60 bytes
So in total i’d need 32 + 60 = 92bytes

So with solution 2 I have less memory consumption, and cache efficiency.

Now I know that the vertex shader will definitely not be the bottleneck for my batch, but still, I prefer solution 2 ^^

It would be great if you could post benchmark results comparing the two methods.

So with solution 2 I have less memory consumption, and cache efficiency.

The time it took you to type even this sentence into the computer, let alone the rest of your post, is not worth the time you “saved”. You could be rendering with immediate mode, using double-precision attributes, and it still wouldn’t make a bit of difference as far as performance.

You have put far more thought into this subject than is warranted. That’s why the 80/20 rule exists, and that’s why you should always benchmark before you optimize.

After benchmarking (ubuntu 10.10 + Radeon 5650 @ catalyst11.11), turns out there’s no difference between VBO, VBO & IBO, or even immediate mode for 5 quads. Not really surprising though, given the few amount of vertices.

Even better when you learn it by yourself, right ? :slight_smile:

I once spent 4 months trying to figure out what was the fastest way to shutdown my program : destroy the context or make the context non current or doing a process kill or pulling the plug from the computer or cutting the power lines to the entire building. It cost 82 million dollars to find the best solution and now I forgot what it was.

@V-man; you’re too funny man! LOL.

Thanks man, I needed that laugh. :smiley: