Drawing a quad with GL3

Since all immediate mode functions and client side vertex arrays are deprecated, I wonder what is the best way to render a screen-space quad (when there could be hundreds used for font rendering).

Basically I see two ways:

  • Having a VBO with 4 vertices where the vertex data (position, texcoords) is updated dynamically using glBufferData
  • A single VBO with 4 vertices but the vertex data (x,y,u,v) is given to the shader as a small array of uniforms and the shader resizes the quad based on the uniform data

The first way is certainly less complex (although 2 is also easy but needs a more specialized shader) but I wonder how much GPU synchronization will be of a performance issue. The same vertex array is possibly reused a hundred times so the driver always needs to wait until the VBO is no more in use by the GPU.
There is an NVidia presentation which says that internally the driver can create a new VBO when glBufferData is used but I don’t know if this optimization is done by all vendors.

If you need a quick way to render small amounts of geometry, you can try out my immediate mode emulation: GLIM

Full source is provided, so you can check out, how i do it, the VBO part is pretty straight forward.

What i actually do is cache all incoming geometry in a client-side array and when a batch is ready, i create a VBO using the STREAM_DRAW hint. I then immediately call DrawElements and delete the VBO right away.

I’d say you can’t make that much more efficient, as long as one does not render the same data several times, which would mean you needed to know, whether you have rendered something before (last frame) and can reuse it.

About your ideas:

  1. is overkill, specifying data via uniforms should be done very very rarely. Definitely not an a per-quad basis.

  2. this would still mean you needed to do a drawcall for each quad. Better than 2), IMO, but still not really good.

Jan.

Specifying data via uniforms is not a bad idea if you think about it a little bit…

Let’s say we have a vertex buffer which contains something like {(0,0), (0,1), (1,1), (1,0)}. It is bound all the time (no binding overhead, no geometry-store overhead in device memory).

Coordinates of a quad can be stored in one vec4 (left, top, right, bottom) and an array of quads can be uploaded as uniforms and rendered with use of instancing (ARB_draw_instanced). This gives us about 4096 rendered quads per draw call (size of G80’s constant memory). You can reconstruct the position and texture coordinates of each quad in a vertex shader.

Seeing that the constant memory is fast as hell and CPU overhead is minimal (alternating glUniforms4fv and glDrawArraysInstancedARB), it seems to be a pretty fast method…

This is similar to a method I have been using for a while for Quads and billboards / “point sprites of infinite size”.

I actually store some of the vertex data in shaders, or in VBOs, but the Uniform method works too. As would attributes, or logic in a Geometry Shader.

Of course not all of us have working implementations of ARB_draw_instanced and have to find other ways around that, but don’t get me started on that! :wink:

Thanks for your answers!

Probably I will reorder my rendering pipeline to make it easier to batch together more elements. That way I can reduce the number of glBufferData calls.