Advice needed for drawing many quads

Hi, I’m working on an engine for a game. My first time doing 3D. I’m currently working on a user interface, essentially its a bunch of quads.

I’m measuring the time a frame, or one game loop, takes as well as the fps. I’m also measuring the fps via fraps. I’m getting a fps of about 600, where fraps is measuring 56. If I remove my draw calls, fraps bounces up to 80. So I’ve deduced that when I send draw commands to the gpu it does it whenever so that the rest of my program can continue.

I want to get the fraps fps up to 80. Which I believe is my limit due to the refresh rate on my monitor.

Currently I draw each quad, of about 100, with a single glDrawElements call. In between some calls updates are made to glScissor because some elements are contained within other elements, so the bounds of the parent element are set in glScissor.

Other than that quads have a position and dimensions as well as the region of a shared texture which it will be textured with.

What would be the fastest way to draw x number of textured quads, but also keeping the glScissor aspect?

I thought about drawing the same quad repeatedly with glDrawElementsInstanced and use an instanced vertex array attribute, the attribute or attributes holding positions, uv coords. But then I’d need something else for the glScissor effect, or would it be best to do the scissoring manually on the cpu? If not, what’s the fastest way to get it to gpu?

Texture buffer object? Uniform array?

Basically, I’ve determined that the bottle neck isn’t on the client side, it’s on the GPU side; so I figured I’m sending the data to the GPU in a sub-optimal way. I want to know the most optimal way.

I thought about loading a Texture Buffer with the position, dimensions and the texture region as well as the glScissor dimensions all for the vertex / frag shaders to handle; but doing this requires updating a buffer; so wouldn’t it be the same just updating instanced vertex buffers anyway? Rather than computing it all and adjusting vertices on the vertex shader I could do it on the CPU and just have the position as a instance array; or just update a bigass VBO? (I think the glScissor aspect would definitely need to go on Fragment side though?)

Am I right to assume that the issue is that I’m making 100 draw requests when I should just make 1?

The fastest way would be to drop the glScissor calls so that you can do it all in one draw call.

You can either update the vertex data for each quad so that it’s already clipped to its parent, or you could attach the scissor rectangle as a (flat-shaded) vertex attribute and perform the clipping in one of the shaders. The easiest approach would be to have the fragment shader perform the scissor test and execute “discard” if it fails. Implementing it in the vertex shader may be more efficient, but each vertex would need enough information to be able to calculate the clipped texture coordinates. Instanced rendering would probably help here.

Probably. Once you fix that, you’ll probably find that you don’t need to worry about any other optimisations; 100 quads is such a trivial workload that you can just use whichever approach is most convenient for the client side.

Assuming that these quads are basically sprites, that’s likely to be per-instance attributes containing origin+size, texture origin+size, clip rectangle, and any other per-sprite data. The only per-vertex data would be the vertex position in quad space (i.e. from 0,0 to 1,1). Then you only need to update one attribute to move a sprite rather than four.

Small nitpick: vertex attributes are not flat-shaded, only varyings can be flat-shaded.

Another way to implement clipping of primitives is to use (drumroll) vertex clipping. Just write the four signed distances between the current vertex coord and the scissor rectangle sides into gl_ClipDistance and you don’t need instancing or texture coord calculations, vertex clipping will do all this for you.

Implementing it in the vertex shader may be more efficient, but each vertex would need enough information to be able to calculate the clipped texture coordinates.

More efficient, possibly, but there are significant problems here. You’d effectively be doing clipping rather than scissoring. And Vertex Shaders are kinda bad at that. Especially if all the vertices of a quad are outside of the clip rectangle. This is even worse if you have a rotated quad, since now you have to find some way for a VS to be able to discard or hide a triangle.