How to pack data into buffers when needing to be able to modify and resize every mesh separately

I have a scene with thousands of objects. For each object I need to be able to swap out the mesh data completely (different data with different size).

If I were to use a shared buffer, swapping mesh data for an object, would result in having to leave the old data in the buffer. Leaving old data wouldn’t be a big problem if you change the data limited number of times, but I need to potentially be able to do it unlimited amount of times. Thus it would eventually eat up all the gpu memory.

What would you guys recommend? Should I use a separate buffer for each object?

100% completely unique objects, or instances of a smaller set? Your 2nd sentence suggests the former.

Does performance matter to you during updates, or just rendering the correct result?

If perf doesn’t matter (e.g. R&D renderer, not a production renderer), just create and delete buffer objects at runtime and upload to them just like you would at startup. This is simple, but may perform poorly.

If perf does matter, then it’s not just a matter of total GPU memory consumption but avoiding implicit synchronization stalls in the driver during updates. For that, read this:

Also, your assumption that use of a shared buffer implies leaving old data in the buffer forever and unbounded GPU memory usage isn’t correct. Think “circular ring buffer” (updated efficiently using the techniques on that wiki page) and go from there.

And re GPU mem usage, something you should think through if you haven’t already:

  • What’s the amount of GPU memory offered by the lowest spec GPU that you’re targeting?
  • What’s the amount of GPU memory required for ALL of your models?
  • What’s the amount of GPU memory required for the largest set of your models that will need to be loaded+rendered at one time?

Thank you very much for the informative post!

Yes the objects are in fact unique.

Does performance matter

Performance does matter so I guess I could use streaming. Can I ask you this. Since I didn’t know about streaming the way I handled uploading dynamic data before was by collecting all the mesh data on the CPU, and keeping track of size and offset for each mesh. I would do this for each frame. Then merging everything into one big array and uploading it with one call to glBufferData at the end of the frame after which I would swap the buffers. Now as I understand with streaming instead you allocate a ring buffer with enough size, and then upload an individual mesh directly into the first available position with glBufferSubData or by mapping. So I guess what this means is that since GPU is working asynchronously with CPU, sending the data to GPU directly when available instead of waiting to upload everything together at the end of the frame is faster?

Also I’m wondering if I were to use separate buffers for each mesh, if I don’t need to modify most of them at a specific time (user decides which ones to modify) I can skip having to reupload their data. But with streaming would it mean that I would have to reupload the data for each mesh on each frame?

The key is not where you upload the data (e.g. whether you use a ring buffer or not, when you reuse old space, etc.). The key is uploading the data in a way that will not trigger implicit synchronization in the GL driver. That’s what you’re trying to avoid.

PERSISTENT mapped buffers with explicit sync objs is one way. UNSYNCHRONIZED mapping with buffer orphaning is another. Though the latter may trigger implicit sync within the GL driver when it’s configured for multithreaded driver mode (if that matters to you).

glBufferSubData() may or may not trigger implicit sync in the driver, depending on how it’s implemented. Whichever method you use, you’ll want to test perf with it on the driver(s) and configs you care about and ensure that it yields sufficient performance for your needs.

GPU working asynchronously with the CPU is generally what you want (within limits), but implicit sync in the driver thwarts that, blocking the CPU and forcing the GPU to (at least partially) catch up with the CPU. CPU-side buffer updates done poorly can easily trigger this because the GPU may still need to operate on the previous contents of a buffer object before the CPU update.

Not necessarily. I’m not sure where you got that.

No. Just because you use streaming doesn’t mean you have to re-upload everything ever frame. You just upload the new stuff to the GPU, and reuse the old stuff that you previously uploaded, saving upload bandwidth and time. But you have to update and render with buffer objects efficiently, or you might as well just use old client arrays (where you do re-upload everything every frame).

Which reminds me: benching your code against a non-VBO client arrays implementation can be instructive, as that code path has been hand-tuned by graphics driver developers for fast dynamic upload+draw performance on their GPUs.