Single VBO memory management and state change optimization techniques

Erik9631 · May 24, 2022, 11:16pm

This seems to be a very undocumented topic and I spent hours researching this without finding any reasonable answer.

Basically from all the various sources that I have researched (These very forums, to stack overflow, reddit and even the Wiki) there is one general recommendation to guarantee high performance in OpenGL and that is limiting the number of state changes.

I have managed to find a benchmark(From Nvidia), which I am not sure how up to date is, which shows various OpenGL operations and how expensive they are.(Can’t post links otherwise I would place it here)

So the most basic optimization which seemed to be suggested everywhere is sorting your Shaders, Textures and buffers based on the benchmark, to guarantee the least amount of state changes per single frame.
This is pretty straight forward and makes sense.

However I dug up a pretty large hole when it comes to managing state changes of a VBO.
I have not found any reasonable documentation or benchmarks about how to do this efficiently and what is the considered to be the most “Modern, up to date method” of handling them.
Generally there are two approaches.

Use relatively many VBOs and bind them as appropriate (Letting drivers handle their memory management)
Use a single VBO per every unique combination of vertex attributes that object might have, bind it once and put all the data you need into it and draw it via batch draw commands.

The second case, as I mentioned earlier seems to be a massive rabit hole, because using one large VBO isn’t as simple as it sounds.
From what I have understood, the basic idea is that you allocate a large chunk of GPU memory and manage it on your own.
So you need some kind of memory manager or a memory allocation system that “Allocates memory” within the buffer efficiently and fast.
So each time you want to add, or modify vertex data, the allocator finds the appropriate address within the VBO and places the data there.

Needless to say, writing a fast and efficient memory allocator isn’t trivial and in practice, especially in operating systems, multiple algorithms are used (And chosen based on heuristics from meta data in order to guarantee the fastest memory allocation performance). You need to deal with memory meta data overhead, external fragmentation, internal fragmentation allocation speed and it is nothing but a compromise between all of those.

I could spend hundreds of hours perfecting such system and even then it would be questionable whether the performance gain would outweight using separate VBO for almost everything.

What I find absolutely astonishing is there is literally zero documentation, zero discussion, zero anything about this. There are no general libraries that allow you to manage VBO memory, there are no “Best practices” or guidelines which warn you against possible pitfalls.

So all this is just a big smell for me and I am not even sure if this is the right way to go.

Further information regarding this topic would be appreciated.

Thanks.

Dark_Photon · May 25, 2022, 1:00am

No. No single change or combination of changes, whatever they are, will “guarantee high performance” in general, without any qualification of hardware, software, drivers, or technique.

Possibly you refer to the one mentioned here?:

originally posted in this NVIDIA presentation here?:

Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead (2014, Slides; p. 48)
Beyond Porting: How Modern OpenGL can Radically Reduce Driver Overhead (2014, YouTube; offset 31:56)

These aren’t really the only choices, or even the best choices.

It seems you may be unaware of interleaved vertex attributes, or that buffer objects themselves are typeless. For instance, you can store all of your vertex attributes and index lists, for multiple batches, in the same GPU buffer object, with no problems at all.

Sounds like you want to read this wiki page, if you haven’t already, and ask follow-up questions:

Buffer Object Streaming (OpenGL Wiki)

That’s for handling efficient streaming of dynamic content to the GPU. For static content, you don’t really need this.

True. But you really don’t want to go that route anyway.

Motivational blog post (that actually strongly related to your question):

This One Weird Trick Let’s You Beat the Experts (Supnik, 2021-10)

But seriously, for dynamically uploaded content, just read that Buffer Object Streaming wiki page and ask questions.

You just haven’t tripped over it yet. There are chapters in books about some of this.

Two basic issues:

How to manage buffer object memory for vertex attributes and indices
How to issue draw calls sourcing from those buffer objects most efficiently.

From a performance standpoint, the cost of #2 is actually a driver for #1. **

Re #2, I can’t tell you how your GPU+driver performs, but you can run tests to determine cost per draw call in various scenarios. I can tell you that on NVIDIA GPUs, command lists and NVIDIA bindless are about the fastest ways to issue batches (draw calls) referring to VBO data. Behind that, it’s VAOs backed by VBO data and client arrays (depending on batch size). And often slowest, plain old unaccelerated VBOs. The larger your batches (draw calls), the fewer your draw calls, and the less the cost per draw call matters. But bottom-line is, you care about minimizing total draw call cost and overhead, especially if you have a ton of tiny batches.

Re #1, for static content, just pre-upload that on startup to whatever buffer objects you want. To the extent that you’ll batch draws for multiple objects together in shared draw calls, obviously you’d want to combine those in the same buffer object or objects (they talk about this way back in that 2014 presentation, for instance). For dynamically-uploaded content, see that Buffer Object Streaming page for a primer. In either case, you make those buffer objects as quickly accessible to draw calls as possible, to keep your cost per draw low. Minimizing draw calls by batching more content together can help reduce this overhead, to a point. But at some point you start trading culling efficiency.

** (There’s also the issue of how to dynamically generate this content on the GPU when needed, but we’ll assume you’re not doing that for now.)

Alfonse_Reinheart · May 25, 2022, 6:32am

While not using a bunch of buffers needlessly is good, it should be noted that in the pantheon of state changes, vertex buffer state is the least expensive to change next to non-block uniform state. Vertex format state (all of the glVertexAttribFormat stuff) is expensive, but the source bindings (glBindVertexBuffer, using separate attribute formats) is pretty cheap.

Again, this doesn’t mean that you should make no effort to minimize these things, but I would focus on the big ones first: programs, texture bindings, and framebuffers.

It should also be noted that any kind of buffer allocation scheme is ultimately going to depend on your use scenarios. There is no “best solution”; what is “best” depends on how you’re getting meshes, loading new ones, when you unload old ones, etc.

Erik9631 · May 25, 2022, 9:34am

No. No single change or combination of changes, whatever they are, will “guarantee high performance” in general, without any qualification of hardware, software, drivers, or technique.

In general, you can only influence your app and by a “Guarantee” I mean that an app that doesn’t do this optimization is going to be slower than an app that does it, no matter the hardware, or the drivers he is running, like you mentioned in #RE2.

Possibly you refer to the one mentioned here?:

Yes that is exactly the benchmark.

These aren’t really the only choices, or even the best choices.

It seems you may be unaware of interleaved vertex attributes, or that buffer objects themselves are typeless. For instance, you can store all of your vertex attributes and index lists, for multiple batches, in the same GPU buffer object, with no problems at all.

That is not what I mean. Perhaps I wasn’t clear with my wording. I am aware of those and actively using them.
What I really meant you need a different VBO for every unique combination of vertex attributes that object might have.
I edited the post to clarify this.

Sounds like you want to read this wiki page, if you haven’t already, and ask follow-up questions:

That’s for handling efficient streaming of dynamic content to the GPU. For static content, you don’t really need this.

I have read it a while ago and I don’t see how it is relevant to the memory management issue.

To clarify the problem better, I am looking for a way to efficiently manage inner VBO memory—allocate, reallocate when VBO is too small, prevent internal fragmentation and external fragmentation.
And I am looking for relevant information that cover this topic in detail. I was originally hoping for a library or something, that would already solve this issue.
Do you have any recommendations?
You mentioned books that cover this topic, I would like to know some examples.

I am also aware of NVIDIA bindless, which I was researching, but I was unable to find any proper documentation, except their presentations.

mhagain · May 25, 2022, 11:10am

No, you don’t, because VBOs are typeless.

Just to say this explicitly: you can put two different vertex formats into a single VBO. All you need to do is set your strides, offsets, etc correctly and it will work.

Generally a more useful division is between static objects and dynamic objects, so separate VBOs between this type of usage instead. For dynamic objects use streaming or subdata, depending on your update pattern. For static objects you can put everything into a single large buffer if that suits your program. If there’s not enough space in your large buffer, you can just allocate another - you don’t need to worry about resizing existing buffers, and two buffer changes vs one isn’t going to make any measurable difference.

Erik9631 · May 25, 2022, 11:21am

No, you don’t, because VBOs are typeless.

Just to say this explicitly: you can put two different vertex formats into a single VBO. All you need to do is set your strides, offsets, etc correctly and it will work.

Sigh

This is very nitpicky and not that helpful.
I know they are typeless, but once you have a VBO with a certain format, you are obviously NOT going to bind different vertex attributes to it. You would have to change the format of the data in the VBO, which serves no purpose.
It is generally better to just make another VBO with different set of unique attributes and make the data follow the format from the start.

And to the second part of the paragraph.
Can you be more specific? How does knowing the difference between static and dynamic help me solve the issue? In both cases there is data that resides in the memory and every mesh is resides at a certain address in the VBO.
If the object is no longer needed to be rendered (Either out of range or the actor it was assigned to was removed from the scene), then you have a fragmentation already.

In larger games, removing large amount of objects is pretty common and allocating new space each time the limit is reached is a very naïve solution.
You are essentially not reusing any memory space from the VBO that you have allocated.

mhagain · May 25, 2022, 1:22pm

Why not? There’s nothing in the API to stop you from doing it, and if it’s what suits your data management requirements, then you can just do it.

No, no, no, no, no.

There’s a reason why people are repeating the same thing to you, and you just don’t seem to be understanding it. VBOs are typeless; there is no such thing as the “format of the data in the VBO”; it’s the moral equivalent of a void * pointer.

Again, no. There is nothing in the API requiring you to do anything of the sort.

Let’s pick a simplified example. Let’s assume that you have two vertex formats, like so:

struct vertex1 {
    float position[2];
};

struct vertex2 {
    float position[3];
};

You can put 2000 of vertex1 and 3000 of vertex2 into the same VBO, by just loading your data for vertex2 into the byte offset at sizeof (vertex1) * 2000.

Then when setting up your vertex format, for drawing data using vertex1 you set it to use offset 0, for drawing data using vertex2 you set it to use offset sizeof (vertex1) * 2000.

So to clarify the main points:

VBOs are typeless; there is no such thing as the “format of the data in the VBO”.
There is not a one-to-one correspondance between a VBO and a vertex format; there is not a one-to-one correspondance between a VBO and a draw call.
The GL calls for setting up a vertex format which take an offset parameter can be used for offsetting into a VBO to where the data begins; they are not just for offsetting into different members of a struct starting from position 0.

Alfonse_Reinheart · May 25, 2022, 1:53pm

The problem here is that any useful answer will have to be based on the specific patterns of your usage.

For example, consider a game with a streaming world. Each streaming section contains textures and it contains mesh data stored in buffers. So… how do you deal with that?

You could build your world such that the vertex data for a streaming section must fit into a specific maximum byte size. Therefore, you have a fixed number of buffer objects (more than is needed for the visible area) that each have the same size of storage. When you get far enough away from a section that it is no longer used, you mark its buffer as unused. When you stream in a new section, you pick up an unused buffer and load the vertex data into that buffer’s storage. Under such a paradigm, you never need to create new buffers (at least, not for streamed data); you’re just reusing them.

That’s a very efficient scheme. However, it requires relies on having a fixed buffer size for all sections. Maybe that’s a limitation you cannot afford. If that’s the case, then you’re going to have to do something else. What exactly that “something else” is depends on what limitations you’re willing to live with.

So ultimately, what you’re asking cannot be answered without understanding the details of what your loading/unloading patterns are, what limitations you’re willing to impose, etc. You can beat malloc, but only when you have a situation that’s more specific than malloc covers. Without knowing those specifics, there isn’t much we can do.

It’s important to divorce what you think you have to do from what you actually have to do. The OpenGL API doesn’t make you do this; if you feel that it is “better”, that is only because you feel that way, not because it has to be “better” for some reason.

GClements · May 25, 2022, 4:22pm

If you’re using glVertexAttribFormat, glVertexAttribBinding and glBindVertexBuffer (rather than glVertexAttribPointer), the glVertexAttribFormat relativeoffset parameter is (realistically) limited to offsetting into members of a structure, as the maximum value (queried via GL_MAX_VERTEX_ATTRIB_RELATIVE_OFFSET) isn’t required to be more than 2047. The glBindVertexBuffer offset parameter has no such limitation, and can be used to select a sub-region of the buffer.

Erik9631 · May 25, 2022, 5:09pm

The point of the API is not to impose limits on the user. The fact that the API allows it doesn’t mean that it has any meaning or is useful.

No, no, no, no, no.

There’s a reason why people are repeating the same thing to you, and you just don’t seem to be understanding it. VBOs are typeless; there is no such thing as the “format of the data in the VBO”; it’s the moral equivalent of a void * pointer.

You are the only one repeating that, because instead of reading what I say and putting some effort into comprehending the question and trying to understand the context, you are acting like a dictionary, nitpicking what I say sentence by sentence.

You defined the VBO format in the next paragraph.
The VBO contains specific data at specific offsets. Once you expect that data to be somewhere, you can not go around inserting different data at completely different offsets with a different size, otherwise whatever attributes you have bound, or will bind will interpret it as an incoherent nonsense.

Again, no. There is nothing in the API requiring you to do anything of the sort.
And again, the fact that the API doesn’t stop you, doesn’t mean it isn’t pointless.

You can put 2000 of vertex1 and 3000 of vertex2 into the same VBO, by just loading your data for vertex2 into the byte offset at sizeof (vertex1) * 2000 .

Yes you can, because you already defined the format of the VBO and you know that between the memory offsets of 0 and 0x3E80, you will have a set of Vertex1s and between address 0x3E80 - 0xCB20 you will have a set of Vertex2s. You literally defined the VBO format, by specifying the strides and the offsets for both attributes.

However doing what you suggest, which is binding a vertex attribute which doesn’t follow the expected format, say one that has a stride of 4 instead of 8 at offset 0 - 0x3E80, or one that has the stride 2 instead of 12 at offset 0x3E80-0xCB20, or worse, with offset that is not 0 or 0x3E80, then you will end up with incoherent non-sense.

The fun thing about void pointers is that they are meaningless unless you know how to interpret the data. And how the data is interpreted is given by how it is laid out on the specific memory address.

So claiming that VBOs has no data format is purely and completely false. It is an equivalent of saying that an app loaded in the RAM is not segmented and is laid out in a purely chaotic way. If that was the case, the operating system wouldn’t be even able to execute the app.

That’s a very efficient scheme. However, it requires relies on having a fixed buffer size for all sections. Maybe that’s a limitation you cannot afford. If that’s the case, then you’re going to have to do something else. What exactly that “something else” is depends on what limitations you’re willing to live with.

Thanks.
Yes, generally what you wrote is one of the use cases that I want, but it is covered in the streaming section of the wiki that you already posted, so this is fine.

However another use case that I am asking about is how to handle is dynamically removing or adding objects in the scene. Once they are in the VBO and later are marked for removal (Because their actors are no longer part of the scene), this would create fragmentation of the VBO.

How is this specifically handled? How do modern game engines handle this efficiently?
This is what I specifically talked about when I mentioned that guides for something like this are scarce (I couldn’t find anything).
So that is why, if you have a book, or a blog or any kind of documentation, that covers it and talks about a method which allows me to handle this quickly and efficiently, I would appreciate it.

mhagain · May 25, 2022, 5:20pm

But that’s not what I suggested. At all.

Alfonse_Reinheart · May 25, 2022, 5:30pm

If you’re just going to ignore advice you don’t agree with, why did you ask the question to begin with?

You are being told that the API allows this for a reason. Saying that it allows it by accident, and that you’re not expected to use this freedom, is just ignoring all the people who actually use it.

No, he defined two vertex formats. The term “the VBO format”, by its nature, means that there is only one. His example showed two.

The point he’s trying to make is that there is no the VBO format. Different regions of a buffer can have different formats; the buffer as a whole does not have to have a single format.

The overall point is that you shouldn’t have so many different buffer objects. Sub-dividing buffers, such that different meshes with potentially different vertex formats share the same buffer object, is a good thing.

Consider the example I gave about having fixed-sized blocks. You would only need a single buffer, with N blocks of the fixed size. You have a table that says which of those blocks is active, and to render a particular mesh within a block, you apply the byte offset for that specific block. Within a block, you can have as many vertex arrays, each with different vertex formats, as you like.

And all with a single buffer object.

That’s not a specific use case. You’re basically asking what to do if objects need to arbitrarily be added to and removed from memory. Well, if there are no controls for any of that, if it can happen at any time without any coherency, rhyme, or reason, then you’re going to have to write the equivalent of a generic memory allocator.

Most programs don’t arbitrarily load and unload things. Or if they do, they have a fixed amount of stuff that they can arbitrarily load and unload. There are no guides for the general scenario because most people avoid the general scenario.

Erik9631 · May 25, 2022, 6:09pm

But that’s not what I suggested. At all.

I am sorry, but we all seem to be heavily mis-understanding each other here.

If you’re just going to ignore advice you don’t agree with, why did you ask the question to begin with?

Because literally NONE of the answers are answering what I have asked, you are all going in circles and arguing about semantics.

Common sense people please, come on.

No, he defined two vertex formats. The term “the VBO format”, by its nature, means that there is only one. His example showed two.

VBO stands for Vertex Buffer Object.
If I say Vertex Buffer Object format, I am obviously talking about the way the buffer object is laid out inside. The fact that there are two Vertex Formats not VBO formats doesn’t change that. The VBO follows a single pattern, just like you stated, which is as whole defined by the two Vertex Formats in this case.
If you don’t know the pattern, you can’t read it.
Now I am not an OpenGL expert, far from it, and I don’t know some of the terminology, that is why I am trying to explain the things I say in detail.
I hate arguing about stuff like this because it is just semantics and it is completely besides the point. We all seem to be talking about the exact same thing so why can’t we just get along?

I made a mistake of trying to generalize some of my statements, which is hard to do and they weren’t on point, which I am sorry for and I get it.
But I also got frustrated by the fact that those unnecessary details got nit picked and are besides the point of my question.

That’s not a specific use case. You’re basically asking what to do if objects need to arbitrarily be added to and removed from memory. Well, if there are no controls for any of that, if it can happen at any time without any coherency, rhyme, or reason, then you’re going to have to write the equivalent of a generic memory allocator.

As if this wasn’t already in the title of the post and wasn’t the whole point why I made it in the first place.

Most programs don’t arbitrarily load and unload things.

Engines like Unreal, handle this somehow. Take their editor as an example. If you have hundreds or thousands of different models, you can place them in the “Scene” at any time. You can’t load all of those models in the memory at once, they would be too big.

But now that we finally understand each other, my question is HOW do they handle this. I Am not asking for a specific answer to a specific issue, I am asking for materials that cover this topic.

I can have a more specific example if you want:
Imagine a very simple map that represents an arena, in this arena you can have different models that perform some actions. Certain actions causes the model to die and it is replaced by another random model.
You can’t have all the models in the memory as they would be too big.
You want to be able to unload them and load them seemlesly.

Correct me if I am wrong, but what you seem to be suggesting, like generally, excluding the example I made, is caching the models that I expect to be used in the specific area of the map for example and remove those that are not expected to be there based on some generic set of conditions. This is a nice solution which I like, however it doesn’t cover all the use cases.
If I had a huge map which could randomly have any number of models that could spawn anywhere and I couldn’t place them all in the GPU, how would I handle that?

I know for example, that GTA solves this, by only spawning cars which are currently cached in the memory. That is why you can observe a phenomenon that you generally see a lot of cars spawning on the street that are of the same model as the one driven by the player.

But this is only one method of handling this and this is all very nice because if this is what you are implying then we are going in the right direction and we are closing in to an answer.

Now what I want, is some kind of collection of similar methods that I can learn about and look at their pros and cons. I literally just want to expand my knowledge and find appropriate solutions.

I am currently writing my own rendering engine and obviously the more general the solution is the better. I already wrote a best fit memory manager that handles this on the GPU and it works.
I also did benchmarks and it seems to be just a tad slower than malloc(Haven’t tested all the possible use cases, so it might be very slow with certain allocations), in the most general case. It has higher upper bounds and lower bounds (But that is a configuration problem because my custom allocator starts with a very little amount of memory and does a lot of reallocations at the start).

This seemed to me at least, the most general solution, however I don’t wanna live in a bubble and the fact that I couldn’t find literally any materials that cover this, made me question whether I am not overcomplicating a very simple issue.
There are definitely people, with way more knowledge than me, especially in large companies, that somehow solved this issue and all I am looking for is to learn/find out how to make this work efficiently.

That is the whole reason why I came here and made this post.

Alfonse_Reinheart · May 25, 2022, 7:24pm

Here’s the thing: OpenGL has a definition of what “vertex format” means, and the layout of the entire buffer object isn’t that. A vertex format by OpenGL parlance is defined by glVertexAttrib*Format or the other functions glVertexAttrib*Pointer. A vertex format is defined by part of the VAO. So when you say “the VBO format,” that is what we’re thinking of because that’s how OpenGL defines it.

OK: Do I get to limit the size and complexity of each model to a specific byte size? And since loading a model also involves loading textures specific to that model, do I get to force them to a byte size (and format. And texture size) as well?

If the answer to that is “no”, then you’re going to have to write some kind of general purpose memory allocator. Or something suitably like that. Allocate a single buffer object and sub-allocate from it as needed. Make sure to preserve the ability to move data around within the buffer to reduce fragmentation.

And even this could be improved by having domain knowledge. For example, if you have foreknowledge that a model will be unloaded “soon,” (for example, if an entity has a death animation during which the model needs to be there, but the model can go away once it’s over) it would be useful to flag the model as “soon to be destroyed”. If you have regular defragmentation passes, you can choose to move any such model to the end of the allocation arena.

GPUs are finite, and they have finite memory resources. At some point, there has to be some limit on the amount of stuff that’s accessible at any one area. That limit is either something imposed upon the people building the map and deciding what spawns in, or will be imposed when you are unable to allocate further memory from the GPU.

Either way, you will encounter a limitation.

No. Because any solution has to be based on domain knowledge specific to the problem in question.

Consider your GTA example. They were able to use that solution because they knew they were dealing with models of cars. It would not be a general solution to avoid loading any mesh by just swapping it out with one that’s already loaded. That would lead to boats driving on streets. Or planes. Or buildings. And some of those don’t have wheels or doors, so how would animations work on them? The different cars in GTA were designed to be interchangeable to one degree or another.

Also, consider where the logic for this solution lives. It’s not the game “engine” that does this swap; it’s the game code itself. The game decides to spawn some cars entities, and those entities know that they use car models. Those car entities look at what is loaded to see what car models are available rather than forcing a load. That’s all game code stuff, not game engine decision making. The game code may ask the engine what is loaded, but the decision to just pick from what is already loaded and not attempt to load something new is done by game logic, not the game engine.

Every good solution is a complex interplay between what the game understands and can tolerate, and what the game engine says the limits of memory and load times are. Even on the Unreal Engine (or similar tools), games are designed understanding how UE works, how long it takes to load things, how much space they take up, and what they can afford to fudge things on (and what they cannot).

They don’t just point at some models and tell UE to work out the details (at least, not in terms of resource allocation). Not unless they know that what they’re doing is already well within the limits of the system.

Erik9631 · May 25, 2022, 8:19pm

Thanks a lot
This seems to answer probably all of my questions.
There is still place for discussion however:

OK: Do I get to limit the size and complexity of each model to a specific byte size? And since loading a model also involves loading textures specific to that model, do I get to force them to a byte size (and format. And texture size) as well?

What do you mean by “Limit the size and complexity”.
If you mean, guaranteeing that the models and textures won’t be above size N then how would that help solving the issue?

Lets try an example:
If we have a total GPU memory of 100 KB, and we have have 200 meshes where they have a random size between 0.5KB - 2KB, and the maximum number of models (Models are also without textures) in the scene can be 50, but each time one object dies there is an RNG function with a uniform distribution that picks another model out of the 200 to replace it, and you would want to guarantee that this would happen without causing stuttering or noticeable performance impact,
would there be a more efficient way of handling this, other than using an allocator?

The way I see this is that you could go with the first 50 objects without much of a problem. With a little luck you will have a fair chance that you will have enough memory in reserve to continue allocating objects. At around 100 objects, maybe sooner, you will very likely be out of memory.
And this is where it becomes interesting. How do we proceed from here? Since it is a uniform distribution there is a fair chance I will get an object that has not been loaded in and I will have to swap it out in the memory for something else and since it is uniform, it could be any of the 200 meshes.
So any smart caching can be thrown out of the window in this case.
So ideally you want to erase or unload unused objects and allocate it there—which is our well known allocation problem(With all the pros and cons) that we have been talking about.

What is your take on this problem? Do you think there is a better way of handling this?

Alfonse_Reinheart · May 26, 2022, 1:48pm

Given that the maximum mesh size is 2K, and the maximum storage is 100K, and the maximum number of meshes is 50, then the solution is fairly obvious: create a single buffer of 100K in size and make a pool allocator with 50 2K blocks.

Whenever a “mesh dies,” you remove it from the pool, making that slot empty. Whenever a new mesh is needed, you upload it to an empty slot in the pool.

Does this mean that there’s a chance of having duplicate mesh data? Yes. But avoiding “stuttering or noticeable performance impact” is not about using the fastest methods; it’s about using the most consistent methods.

If doing a model upload is going to cause stuttering, then that is a problem you need to solve. How you allocate and use GPU memory is irrelevant to fixing that. Solving that problem relies on being able to know ahead of time which mesh to use and doing work before it’s necessary to actually use it.

Dark_Photon · May 26, 2022, 6:01pm

I wish that were true, but unfortunately it’s not.

Update a slot
Render models
Update a slot
Render models
…rinse/repeat…

Think like a driver. What’s going to happen here?

If you’re making any assumptions about transfer method, sync, flushing, and queue-ahead, you might want to state them. Because without them, the above sounds like a recipe for implicit sync stalls.

Alfonse_Reinheart · May 26, 2022, 6:19pm

Then don’t let there be opportunities for implicit sync. Map the buffer persistently and do any necessary synchronization yourself.

I’m assuming good use of the API to begin with. If you don’t start with a good foundation, the higher level stuff won’t matter.

Dark_Photon · May 26, 2022, 6:58pm

Cool. That’s what I thought.

I figured that’s important to explicitly include in this thread though, for those readers for which this detail wasn’t implicit. And for @Erik9631, since he’s trying pick up a specific VBO usage/mgmt method.