default vs named uniform block

Hello,

Does the named uniform block storage have any performance disadvantage over the default uniform block storage?
I noticed that the available memory size for named UBs far exceeds the size of the default UB. Does this mean that the default UB is located in some on-chip fast memory whereas the named blocks are in the (slower but more abundant) off-chip video memory?

thanx in advance,
lucho

Did no one faced this question before?
When you have to choose where to place your uniforms and how to manage them what do you do and why?

I find the named blocks more flexible and easy to work with and they can be reused between different shader/program objects, but I am concerned about possible speed disadvantages.

I guess this is very vendor-dependent matter.
Anyone with suggestions/thoughts on this?

AFAIK the GPUs have an on-chip storage for constant data which has a particular size (usually 16 to 64 KB). This is used for both the default uniform block and named uniform blocks (one that is backed up by a buffer object), however the actual implementation can be vendor-dependent.

The question is where the default uniform block uniforms are stored. As much as I can imagine, I can think of two possible solutions:

  1. Every program object has a driver allocated buffer object that holds its default uniform block data and is accessed in the same style as named uniform blocks.
  2. Default uniform block data is baked into the binary code of the shader and each time you modify a uniform, it actually modifies the binary and uploads the new program binary to the GPU if that is the currently bound.

While in theory both solutions seem to be possible I would think that option #1 is the one used by the vendors as it is less burden for the driver developers.

Constant buffer accesses are rather fast (much faster than texture accesses or global memory accesses, usually about 1x to 25x) but are about 4x-5x slower than GPRs (general purpose registers). So if the drivers do use option #2 then default uniform blocks can be faster. However, considering that this is just theory and drivers are unlikely to do such optimizations, I would rather treat the two as to have same performance from GPU processing point of view. Furthermore, updating the data of the default uniform block can be done only with multiple GL calls that have their CPU overhead and even if the drivers do implement option #2 I wonder how heavyweight would it be to reload the shader all the time a uniform is changed (at least between draw commands).

So to answer your question:
There is most probably little to no difference between the storage of the default uniform block and named uniform blocks, but the later are more convenient for use and provide better client side performance, however this is just theory, vendors may provide more accurate information.

Here is the line of thought that lead me to the suspicion about the named block performance:
In OpenCL and CUDA there is a memory type named “constant”, which i suspect represents the same hardware as the opengl default uniforms. It is quite small (64k on nvidia). The documentation is rather vague, but somewhere is mentioned that it is a small on-chip constant cache and is as fast as register.
If that small constant cuda/ocl memory is really the same as the default uniform block in opengl, that would mean the named blocks may be at serious speed disadvantage - they can be very large so they can not use this constant cache (which is small) and will have to be in the slow video memory with no on-chip caching. That would make them quite slow and consuming precious memory bandwidth.

I’m pretty sure the constant memory model of OpenCL is the equivalent of OpenGL uniform buffer objects so named uniform blocks should be as fast as that constant memory store.

I’m not sure about NVIDIA’s constant memory architecture but I read in ATI’s OpenCL programming guide that constant memory accesses are about 4 times slower than registers even though it is an on-chip memory cache and I wouldn’t think that in case of the NVIDIA architecture this on-chip memory would be as fast as registers but it is definitely the second fastest memory type after GPRs.

Anyway, this constant store can be used by both the default and named uniform blocks.

That’s exactly what i want to know if it’s true :slight_smile:
How can be sure of it?
Consider this: in opencl the constant memory can only be very small, whereas the named uniform blocks can be far larger. Why is that if both are based on is the same hardware?

Anyway i already decided to use named blocks because even if they are slower, i figured the burden of always re-specifying all the uniforms over again whenever i change the active program object is too much.
The fact that the default uniform block is invalidated every time you change the active program object looks too big disadvantage to me that makes me think i may have missed something.

That’s exactly what i want to know if it’s true smile
How can be sure of it?

How can you be sure of anything? You can’t. The only real way to know the performance of something is to benchmark it.

The fact that the default uniform block is invalidated every time you change the active program object looks too big disadvantage to me that makes me think i may have missed something.

What? The default uniform data is stored in the program. It isn’t invalidated or changed unless you explicitly change it.

Alright, its not invalidated, but it’s not a persistent global state as is the named block.
Imagine what happens when you want to use many program objects with common data - you will have to replicate the uniforms across all programs, you will have to set such uniforms to all program objects every time their values need to be changed. Plus there may be burden for the driver too - when you switch programs it must swap the uniforms, for which there is no telling how much can cost.

If you instead use single program object but change it’s bound shaders then you will have to re-link the program object, which i think invalidates it’s uniforms. So that’s not an option either. (not to mention that the re-linking itself can be very heavy-weight operation).

Imagine what happens when you want to use many program objects with common data - you will have to replicate the uniforms across all programs, you will have to set such uniforms to all program objects every time their values need to be changed.

Yes, but that’s how every program that has used GLSL since it was created works.

Plus there may be burden for the driver too - when you switch programs it must swap the uniforms, for which there is no telling how much can cost.

Yes, but this is how it has always been. This is generally how it works in D3D too. No one has made substantive claims that this is a particularly onerous burden on the implementation.

Considering that every uniform block access burns GPU time, it makes more sense (in lieu of actual profiling data) to minimize shader processing time over whatever has to be done to switch programs.

If you instead use single program object but change it’s bound shaders then you will have to re-link the program object, which i think invalidates it’s uniforms.

Why would you ever want to do that? In general, I would consider re-linking any program object to be bad form.

What do you mean by “this is how it has always been”?
It is not this way with the named blocks and it is not this way
in d3d neither. In d3d the shader constants are not made invalid when you change the shader - they are independent state.

What do you mean by “this is how it has always been”?

Exactly what it says. GLSL has always worked like this, since it’s initial inception. If this were a significant performance concern, I suspect NVIDIA would have an extension or something available to correct it.

Also, I would point out that ARB_vertex_program works the same way (though it does also have global program parameters that are part of context state). So it’s not like this is new or anything.

In d3d the shader constants are not made invalid when you change the shader - they are independent state.

Shader constants aren’t made invalid when you do anything in OpenGL either (except re-link a program, but that’s something you shouldn’t be doing). Though you are right: in D3D9, shader “constants” are part of the state block, not the shader objects themselves. And in D3D10, UBOs are the only way to work.

Also, I’d like to revise my “every uniform block access burns GPU time” statement. This is how it might work.

For any stage of a program, it is statically determinable how much uniform space that stage takes up. If that space is less than the local memory available to that stage, then there’s no reason the implementation can’t just copy the data from the buffer object into local memory and use it from there.

Therefore, it is possible that accesses to uniform block memory do not take longer than non-block memory (outside of the copy to local memory, but that’s small enough to be negligible).

The fact that GLSL has always worked like this does not mean it is the best possible way and there aren’t any performance issues associated with it.
In fact the vendors did introduce an extension (now part of the core API) that allows different way of uniform handling - namely the named uniform blocks. So maybe they too thought the existing mechanism was not good enough.

The fact that GLSL has always worked like this does not mean it is the best possible way and there aren’t any performance issues associated with it.

It doesn’t mean there are performance issues either.

In fact the vendors did introduce an extension (now part of the core API) that allows different way of uniform handling - namely the named uniform blocks.

Which means nothing as far as performance is concerned. There’s no evidence that UBOs are faster in any way than setting uniforms directly. There’s no evidence that they’re slower either, but there is one inescapable fact: implementations have had more time to optimize regular uniforms than UBOs. Therefore, given no evidence either way, it makes sense to default to the non-UBO case unless you have a specific need that UBOs address (more uniform space, shared uniform data, etc).

Plus, I would point out that you still have to set the texture uniforms manually.

This sounds fair enough. It’s nearly what i think. Unfortunately i have a specific need for shared uniform data and thats why i need to use named blocks. But i am worried about possible speed disadvantages they may have.

Plus, I would point out that you still have to set the texture uniforms manually.

Those are completely different matter. They are treated as uniforms only for consistence of the API but have nothing to do with the ordinary uniforms. I’m not concerned about them at all.