Doing away with VBOs - Dangerous?

Hi all!

I have been trying Shader Storage Buffer Objects for a sprite rendering engine I’m making. At some point, there was a Vertex Buffer Object which stored the quads’ untransformed position and texture coordinates. There was a also an (immutable) Index buffer - I only draw quads so the index sequence is always the same. The SSBO contained the texture ID (using bindless), and stuff like rotation, scale etc. Since attributes like sprite rotation applies to the entire quad, it made sense to use a SSBO instead of storing extra floats per vertex in the VBO. I’m using glDrawElements() by the way.

Then, it kind of dawned on me that the VBO may be wholly redundant. Consider this: A sprite is a quad. It’s the same thing over and over again. Why not put everything in the SSBO and use gl_VertexID for all quad attributes?

So I went and deleted the VBO altogether. No shader inputs, no vertex attribute pointers, nothing. The shader just reads stuff from the SSBO and draws the quads. It does work! I tested it on a new Nvidia GPU, an older AMD GPU and a new integrated Intel GPU. But I have 2 main questions regarding this approach:

  1. Is this…dangerous? I mean, is the usage of a VBO mandatory for a draw call to succeed? I have created a VAO, bound an index buffer to it, no attributes whatsoever enabled.
  2. Performance - wise, I haven’t been able to verify any speed increase compared to also using a VBO. This does seem a bit weird to be honest. Does the absence of a VBO somehow slows down the rendering pipeline to such a degree that it offsets any potential speed gains? Should I stick to having a dummy VBO for compatibility reasons?

No.

It’s not the “absence of a VBO” that would be a performance issue. The question is whether reading from an SSBO is slower. The answer is… maybe?

Vertex specification provides more functionality than just reading from memory. Your vertex format can be quite complex; you can put normalized integers of various sizes into the buffer, and the shader will just get the floating-point equivalents. Some GPUs have specialized hardware to do this. And in those cases, if you exercise this hardware a lot, this may be faster than reading from the SSBO and using shader logic to decompress the vertex attribute.

Then again, it may not. Some GPUs don’t have specialized hardware for attribute decompression; a VAO effectively adds some shader logic to your VS to do the read and decompression within the shader.

All that having been said however, having the CPU do quad transformations before writing them is also very efficient. Since the transformations for objects can be updated per-frame, you’re still having to write to some buffer object per-frame (and therefore employ good buffer object streaming techniques ).

And you have to read the transformation data from the object’s storage into the CPU anyway (potentially the slow step in the process) in order to write it to the buffer. So you may as well just do the transformations on the CPU.

No. In fact, VBOs used naively can be slower than client arrays (i.e. just re-streaming the geometry from CPU memory every frame).

NVIDIA bindless can make VBOs perform well with small batches. Or just never launch small batches from VBOs, which is good practice anyway.