Best practice regarding uploading /reuploading/keeping track of of vertices?

Ploppz · June 10, 2015, 3:21pm

Hi.

I’m quite new to OpenGL, and I just started out by making a simple text renderer. It generates a texture atlas, and there is a function that generates a vertex array (triangles with respective texture coordinates), given a string, which it then uploads and renders.

My question is how I should best deal with redrawing and changing position of text.

Should I keep a “model” of each text that should be drawn, in memory - string, position - and redraw by completely regenerating the vertex array and uploading it again for every change?

Or should I, once I generate vertices given a string, keep them in the array until that respective string is requested to be removed from the scene? I suppose, in this case, that displacement would happen in the shader by sending a matrix. The difficulty with this one would be to map every text we want to draw, to a certain position and length in the vertex array. Does OpenGL provide any functionality to make this easier? Would I have to reupload the entire array every time I would add some new text to scene? Is DMA a solution?
But a thought that arises now is; if all the vertices are in the same array, how would I apply different translation to them? And an answer that pops up is indeed in a loop, set the uniform matrix, and draw a subset of the vertex array, for every string that is to be drawn. Is this normal to do?

Generally, I have a lot of questions regarding how to structure my draw calls and data in a nice way, when I get more and more complexity in my program/game; a growing amount of distinct objects that all need special treatment, and that may also possibly change.

Thanks!

GClements · June 10, 2015, 8:41pm

The first thing to consider is how many different “items” of text are there? In the sense of distinct transformations; a paragraph of text, where all of the glyphs maintain the same relative position would be a single item.

If there are only a few, then one draw call per item, with the transformation in a uniform, would be a reasonable approach.

If there are many distinct items, then it would probably be better to use an array of transformations and add an extra integer attribute to hold the index of the transformation. That way, all of the text can be coalesced into a single draw call and you only need to change the transformations.

If you take the approach of moving text by changing the vertex array, then the main thing is to keep the vertex positions and texture coordinates separate rather than interleaving them.

Also, for “sprites” in general, you can reduce the size of the vertex arrays using instanced rendering. Each instance consists of one quad (or a pair of triangles). All of the per-sprite data is instanced so that each sprite only needs one set of attributes rather than 4. Assuming that the sprites are axis-aligned, each one needs at most position and size in screen space and a position and size in texture space. Effectively, you’re providing the position and texture coordinates for two of the vertices, and calculating the values for the other two in the vertex shader, halving the amount of data.

You may be able to further reduce the amount of data by using a single index to identify the sprite, with the position and size within the texture atlas obtained from a lookup table.

Alfonse_Reinheart · June 10, 2015, 9:54pm

[QUOTE=GClements;1267565]Also, for “sprites” in general, you can reduce the size of the vertex arrays using instanced rendering. Each instance consists of one quad (or a pair of triangles). All of the per-sprite data is instanced so that each sprite only needs one set of attributes rather than 4. Assuming that the sprites are axis-aligned, each one needs at most position and size in screen space and a position and size in texture space. Effectively, you’re providing the position and texture coordinates for two of the vertices, and calculating the values for the other two in the vertex shader, halving the amount of data.

You may be able to further reduce the amount of data by using a single index to identify the sprite, with the position and size within the texture atlas obtained from a lookup table.[/QUOTE]

Um, is there any evidence that this form of sprite instancing improves performance in any meaningful way? Sure, it reduces the size of memory storage. But you’re not talking about very much memory to begin with. A sprite could reasonably take 32-bytes (4 coordinates, with each position being 2 shorts, as well as each texture coordinate). At best, you can cut this in half.

But are the memory bandwidth savings going to come anywhere near the added overhead from doing instanced rendering? Not to mention the vertex shader has to do actual work reconstructing vertex positions from a point+size, which will almost certainly include some form of conditional branching. And that branching will be decidedly non-uniform (not unless you have a clever way to avoid that).

Ploppz · June 11, 2015, 7:46am

Thanks for replies! I would like to make an implementation of this with a simple interface that I can use in the future, so I will consider the possibility that there are a lot of texts being displayed. Having an extra vertex attribute telling which transformation to use sounds nice. So I’m thinking about regenerating the vertex buffer at every frame / change, but this buffer should have a quite big capacity so it doesn’t need to be reallocated when I reupload the vertices.

GClements · June 11, 2015, 8:08am

As with most forms of optimisations, it depends upon the specifics.

If memory bandwidth is saturated but processing capacity isn’t, replacing the former with the latter should be a net gain. If it’s the other way around, it will be a loss. Similarly, if the CPU is saturated but the GPU isn’t, offloading to the GPU should be an improvement.

The thing about memory footprint in general is that it has multiple costs: system memory consumption and bandwidth, CPU-side processing, upload bandwidth, video memory consumption and bandwidth. So my starting point tends to be to determine the bare minimum that the CPU has to touch. E.g. for moving sprites around (assuming that the motion has to originate on the CPU), the CPU needs to update a single 2D position per sprite, with everything else being deducible from that. That much will have to be recalculated, uploaded and stored each frame, but anything beyond that needs a reason not to be offloaded to the GPU.

Also, you may be able to do better than half. If sprites all have the same size (e.g. tiles), you may be able to deduce all four texture coordinates from a single 8-bit or 16-bit index. If the on-screen size is fixed, you only need one vertex position. If the aspect ratio is fixed, you only need one vertex position and a scalar size. Essentially, the data only needs as many components as it has actual degrees of freedom.

I’m not sure where branching would come into this. The non-instanced data consists of the vertices of a unit square, the calculated vertex positions and texture coordinate are all either base+size*offset or mix(p0,p1,offset), where base and size are per-instance and offset is the unit-square vertex.