ARB_instanced_arrays = slow?

peterfilm · June 27, 2011, 4:23am

on the very different topic of streaming geometry to the GPU using VBO orphaning - dark photon could you describe this process as implemented by your good self please?

Dark_Photon · June 28, 2011, 6:54am

Nearly all of the credit goes to Rob Barris here, as he described the original technique. I just proposed adding batch reuse to it, and using bindless for extra-fast batch dispatch. I also use it for canned batches – not just generated/decompressed batches.

Good links to read on this:

VBOs strangely slow? (OpenGL.org thread, 2/23/10) (focus on Rob Barris’ posts)
[Buffer Object Streaming (OpenGL.org Wiki)]](Buffer Object Streaming - OpenGL Wiki)
mega vbo, any reservations? (OpenGL.org thread, 9/8/10)
The A to Z of DX10 Performance (pg 9)

peterfilm · June 29, 2011, 5:49am

brilliant! thanks for the links, i’ve read the barris stuff before, but it was the batch reuse strategy i was more interested in, otherwise buffer streaming is all just for CPU-computed dynamic stuff.
one gem of information came out of those links though - the DX10 performance pps. Slide 14
Instance data:
ATI: Ideally should come from additional streams (up to 32 with DX10.1)
NVIDIA: Ideally should come from CB indexing.
So it seems nvidia have optimised instancing for uniform buffers rather than instanced arrays. This would explain my experience.