UBO experiences on NVidia

skynet · April 16, 2009, 4:32am

Has anyone of you tried UBOs yet?

I did, on a GF8880GTS with 182.47 drivers (GL3.1 ‘enabled’).

At first I was a bit confused, what these “uniform block binding points” were all about, but now I’m really happy they did it this way! Its like how you bind VBOs to vertex attributes. You just don’t attach one specific buffer to a shader, but instead you bind the buffer to a specific “slot” and then in turn tell the shader which slot to bind to each uniform block. This allows me the following usage pattern:

define 2 shared uniform blocks. They get automatically attached as header to any shader-source. They provide ‘ModelViewProjection’, ‘ModelView’ and alike matrices. These two uniform blocks replace what in former days were ‘GLSL built in uniforms’. Instead they are now provided by myself.
bind those two uniform blocks of the shader to hardcoded binding points (I call them ‘slots’) 0 and 1.
have a central object (‘Frontend’) in the engine which provides what in former days were OpenGL’s projection and modelview matrix stacks. Changes to the stacks are tracked.
Before each draw-call (each going through the Frontend as well), check the stacks for changes. If there are changes, upload the new matrices into the UBO’s that are currently bound to slot 0 and 1.
The UBOs are “multi-buffered”, i.e. I have a few of them and they are cycled through in a round-robin fashion. This way I want to avoid stalls which would happen if I’d try to update a UBO which is currently in use by in-flight draw commands.

(5) is were these uniform buffer binding points have their real strength. The Frontend just binds different UBOs to the slots, without having to notify each shader that a rebind has happened. The Frontend doesnt’ even have to know the shaders which finally take data from the UBOs!

There are some things, that don’t work yet. But I hope, its just driver bugs. First, as soon as a uniform buffer block appears in the shader source (declared as ‘shared’), the block is ‘active’ - even if none of the uniforms inside it is actually referenced by the shader.

My suggestion would be: let the compiler optimize completely unrefenced blocks away. I understand that this won’t work for single uniforms inside shared blocks. But it is perfectly doable for whole blocks! It would allow me to just provide all blocks as header to each source and then, after glLinkProgram, I find out if the shader really references the blocks (opportunity to optimize for fewer buffer updates).

The multi-buffered approach is not yet working. It seems, the shaders are not properly following the UBO-Rebinding via glBindBufferBase(). But I guess, this is just a driver bug.

Aside from that I have not found any big problems with GL3.1 yet. I cannot say yet, if using ‘pure Gl3.1’ leads to faster rendering. I miss glPushAttrib/glPopAttrib a bit. Sometimes the driver is reporting falsely a GL error to me short after creating the context. Additionally, I sometimes get a corrupted image if I run two instances of my program.

GL3.1 itself needs better docs urgently. It is very hard to use the specs for quick reference. I sometimes find it easier to read the extension specs of a certain functionality, just because it provides the information I look for in a concentrated fashion. In the GL3.1 spec, everything is interwoven and spread across the whole document - very hard to figure out. Additionally, beginners will have a very hard time with GL3.0+. It takes quite a lot of work today, to get even a single triangle on the screen. Its ok for me, since I already knew most of the stuff, but a beginner will be absolutely lost. That just cries for a series of tutorials and example code, like those NeHe ones I started with ten years ago

well then, thats my 2cents… share your experiences…

Jan · April 16, 2009, 8:05am

Nice, thanks for the info!

What i wonder: is it actually necessary to do multi-buffered UBO updates? If they are handled just like VBOs, updating a buffer, that is still in use should be handled transparently by the driver. Just clear it with glBufferData (…NULL…) and the driver knows, that it does not need to sync its content with the new data. Or am i missing something here?

I agree with you, that for beginners GL3 is very hard. I am currently replacing matrix stacks by my own implementation. I was very surprised to see, that GL3 did not only away with the STACKS, but with ALL MATRICES altogether! Maybe they should have left the Modelview and Projection Matrix (+ glTranslate, glRotate, glScale) in, but only removed the stacks. I am not sure whether that was a good idea or not.

Jan.

Rob_Barris · April 16, 2009, 4:06pm

Using the ‘packed’ qualifier for the blocks should allow elimination of individual uniforms due to non-use or dead code elimination. Whether the runtime does so or not can’t be predicted, so you have to query after link and see what the outcome was.

http://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt

The data storage for a uniform block can be declared to use one of
three layouts in memory: packed, shared, or std140.

  - "packed" uniform blocks have an implementation-dependent data
    layout for efficiency, and unused uniforms may be eliminated by
    the compiler to save space.

  - "shared" uniform blocks, the default layout, have an implementation- 
    dependent data layout for efficiency, but the layout will be uniquely
    determined by the structure of the block, allowing data storage to be
    shared across programs.

  - "std140" uniform blocks have a standard cross-platform cross-vendor
    layout (see below). Unused uniforms will not be eliminated.

skynet · April 16, 2009, 4:29pm

@Jan:
I don’t know if multiple buffers are the way to go, your suggestion might work as well. We need some info from nVidia/ATI here.
The removal of all the matrix stuff is a good thing, imho. It leaves only one way to provide the needed matrices to the shader. And since it also removed most of the built-in stuff, we programmers never need to worry anymore, if its better to provide matrices (and other data) through built-int uniforms or our own.
A third party library can provide the same functionality in future.

@Rob:
I do not ask for the elimination of individual uniforms inside a uniform block. I perfectly understand that this can’t be allowed for shared uniform blocks (the ones I was talking about). I just want, that the driver recognizes when none of the indiviual uniforms inside a block is used at all and then removes this block (as a whole) from the list of active blocks.

I think that this is currently just a driver issue. The only reason why my preferred behaviour is not yet implemented might be, that otherwise I could not create an “empty” reference shader just for retrieving uniform offsets and block sizes (the driver would just eliminate the blocks, leaving me no way to get the information). I would have to create a dummy-shader which actually references the uniform blocks, just as EXT_bindable_uniform required me to do. But this is no biggie… I just need to know that I have to do this - I have to create that dummy shader already with the current drivers/specs.

Rob_Barris · April 16, 2009, 5:04pm

@skynet, thanks for the clarification, that’s an interesting point that I don’t think had been considered previously. It might not be very difficult to extend UBO to provide that kind of behavior (whole-block activity marking).

barthold · April 22, 2009, 1:10pm

Without seeing your code, I suspect you ran into a driver bug that we already fixed. That fix will be in the next GL 3.1 driver update.

Barthold

barthold · May 5, 2009, 9:48am

An updated driver is now available. Your bug should be fixed.

http://developer.nvidia.com/object/opengl_3_driver.html

Barthold

skynet · May 5, 2009, 2:25pm

Thank you!!
Indeed, the UBO related bug I reported got fixed

Do you know, when there will be 3.1 instrumented drivers included in PerfKit?

ScottManDeath · May 5, 2009, 2:50pm

Yeah, those would be useful. I am currently using the 2.1 drivers from December to get GLexpert support.

sponeil · July 21, 2009, 9:51pm

I’m trying to use UBO’s on a GeForce 8600 with nVidia’s 3.1 drivers, but none of the drivers I’ve tried support the uniform_binding_object extension. I’ve tried 185.52 (from nVidia’s OpenGL 3.1 page) and the very latest 190.38. I’ve tried in both Windows XP x86 and Windows 7 x64.

If anyone could point me in the right direction, I would be extremely grateful.

Sean

Alfonse_Reinheart · July 22, 2009, 12:05am

uniform_binding_object

The extension is “uniform buffer object”.

skynet · July 22, 2009, 12:50am

I tried the latest 190.38 drivers. Indeed, ARB_u_b_o is not listed among the extensions (might be a bug), but the entry points are all there and, as far as I have tested it, working. I have encountered two GLSL 1.4 related bugs, though:
a) #extension GL_ARB_uniform_buffer_object : enable will falsly return that this extension is not supported
b) invariant gl_Position; confused the compiler to report an internal error

All bugs were reported to NVidia and probably get fixed at the time we speak.

Another really nice thing is that between 182.52 and 190.38 alot of optimizations seemed to have taken place. Whereas I needed a multibuffered approach with the old drivers (to the point where I basically had one UBO per matrix update within a single frame), the 190’s do a very good job at buffering/caching the updates. Just one UBO which gets updated via glBufferData()whenever I need to upload new matrices to the shaders, is enough to reach maximum performance.

Alfonse_Reinheart · July 22, 2009, 3:33pm

have encountered two GLSL 1.4 related bugs, though:
a) #extension GL_ARB_uniform_buffer_object : enable will falsly return that this extension is not supported

Are you sure this is false? GLSL 1.4 has ARB_UBO as core. It is not an extension, and therefore the extension is not supported.

sponeil · July 22, 2009, 6:12pm

Sorry, ignore that previous post. I didn’t realize wglGetProcAddress would return valid pointers to the new 3.1 functions if the extension wasn’t in the list.

EDIT: Woops, I didn’t see your replies until after I added my own. My initial post was the last one on the first page, and somehow I missed the “Page 1 - 2” links. And yes, I meant to type uniform_buffer_object. I really appreciate the comments everyone. Thanks.

skynet · July 23, 2009, 12:47am

Well, to me it seemed that it is common practice to leave an extension live as extension, even if it went to core… especially in case of UBO where core functionality and extension are 100% identical. At least, its just a warning I get, not an error:
: warning C7508: extension ARB_uniform_buffer_object not supported

The same case seems to be with the extension string. UBO is not listed as extension, but since the driver reports GL version 3.1, we should be safely assume, its just there. But my guess is, this oddity is just a slip and will be fixed. Why should UBO be treated any different than any other extension?

barthold · July 24, 2009, 9:12am

Hi skynet,

Correct. Since UBO is part of OpenGL 3.1 core, it has to be supported by the driver otherwise the version number should not be 3.1.

NVIDIA never shipped UBO as an extension, hence you will not find it in the extension string.

The more normal path for an extension is to go from EXT -> ARB -> core inclusion, over a period of time. In this case, removing the extension name from the extension string would break already shipping applications, hence we leave it in.

ARB_UBO was released at the same time as core OpenGL 3.1 and therefore there’s no need to have ARB_UBO in the extension string.

Thus first check the OpenGL version and from that deduce what functionality is supported. Then check the extension string for any additional functionality you need.

Regards,

Barthold
(with my NVIDIA hat on)