Has anyone of you tried UBOs yet?
I did, on a GF8880GTS with 182.47 drivers (GL3.1 ‘enabled’).
At first I was a bit confused, what these “uniform block binding points” were all about, but now I’m really happy they did it this way! Its like how you bind VBOs to vertex attributes. You just don’t attach one specific buffer to a shader, but instead you bind the buffer to a specific “slot” and then in turn tell the shader which slot to bind to each uniform block. This allows me the following usage pattern:
- define 2 shared uniform blocks. They get automatically attached as header to any shader-source. They provide ‘ModelViewProjection’, ‘ModelView’ and alike matrices. These two uniform blocks replace what in former days were ‘GLSL built in uniforms’. Instead they are now provided by myself.
- bind those two uniform blocks of the shader to hardcoded binding points (I call them ‘slots’) 0 and 1.
- have a central object (‘Frontend’) in the engine which provides what in former days were OpenGL’s projection and modelview matrix stacks. Changes to the stacks are tracked.
- Before each draw-call (each going through the Frontend as well), check the stacks for changes. If there are changes, upload the new matrices into the UBO’s that are currently bound to slot 0 and 1.
- The UBOs are “multi-buffered”, i.e. I have a few of them and they are cycled through in a round-robin fashion. This way I want to avoid stalls which would happen if I’d try to update a UBO which is currently in use by in-flight draw commands.
(5) is were these uniform buffer binding points have their real strength. The Frontend just binds different UBOs to the slots, without having to notify each shader that a rebind has happened. The Frontend doesnt’ even have to know the shaders which finally take data from the UBOs!
There are some things, that don’t work yet. But I hope, its just driver bugs. First, as soon as a uniform buffer block appears in the shader source (declared as ‘shared’), the block is ‘active’ - even if none of the uniforms inside it is actually referenced by the shader.
My suggestion would be: let the compiler optimize completely unrefenced blocks away. I understand that this won’t work for single uniforms inside shared blocks. But it is perfectly doable for whole blocks! It would allow me to just provide all blocks as header to each source and then, after glLinkProgram, I find out if the shader really references the blocks (opportunity to optimize for fewer buffer updates).
The multi-buffered approach is not yet working. It seems, the shaders are not properly following the UBO-Rebinding via glBindBufferBase(). But I guess, this is just a driver bug.
Aside from that I have not found any big problems with GL3.1 yet. I cannot say yet, if using ‘pure Gl3.1’ leads to faster rendering. I miss glPushAttrib/glPopAttrib a bit. Sometimes the driver is reporting falsely a GL error to me short after creating the context. Additionally, I sometimes get a corrupted image if I run two instances of my program.
GL3.1 itself needs better docs urgently. It is very hard to use the specs for quick reference. I sometimes find it easier to read the extension specs of a certain functionality, just because it provides the information I look for in a concentrated fashion. In the GL3.1 spec, everything is interwoven and spread across the whole document - very hard to figure out. Additionally, beginners will have a very hard time with GL3.0+. It takes quite a lot of work today, to get even a single triangle on the screen. Its ok for me, since I already knew most of the stuff, but a beginner will be absolutely lost. That just cries for a series of tutorials and example code, like those NeHe ones I started with ten years ago
well then, thats my 2cents… share your experiences…