Layout in Uniform Buffer Object

Utumno · August 22, 2020, 12:04am

OpenGL ES 3.0.

I am currently close to exceeding the guaranteed number of uniforms (AFAIK the spec guarantees 1024) in my vertex shader. So far I’ve been sending individual uniforms and uniform arrays. Now, in an effort to fix all the problems with ‘max number of uniforms exceeded’ once and for all, I am trying to convert all of that to either an Uniform Buffer Object or Shader Storage Buffer Object.

Out of the two, I understand an UBO is recommended since it is supposedly faster than an SSBO.
So I’ve created a branch in my code where I am trying to move all uniforms in my vertex shaders to UBOs.

Big problem here is that there seems to be no ‘good’ layout for the UBO. I can choose between ‘std140’ (which aligns everything to 16 bytes) ‘shared’ or ‘packed’ (which as I understand do not guarantee anything, but in practice - in my tests on a few devices - seem to align stuff to 4 bytes).

The basic types of the uniforms which will become parts of the UBO are arrays of ints and arrays of vec4s. So if it were really true that ‘packed’ aligns such stuff to 4 bytes, than everything on CPU-side would be tightly packed, which means that after the change to an UBO on the CPU side I’d still be writing to exactly the same pieces of memory.

Otherwise, I’d need to do a very complicated dance: first guess what the alignment might be, allocate my CPU-side buffers, fill them up with data, then eventually shaders get compiled, I get to figure out the alignment, it turns out not to be what I expect, I need to reallocate all CPU-side memory, and to make matters worse, during rendering I now need to write to my CPU-sided buffers with ‘gaps’, i.e. write one float, skip how many the alignment requires, write another one, etc.

I’ve already tried this and whole this dance causes a revolution in the CPU-sided code, measurably slows it down (I’ve tried testing by setting the layout in the shader to ‘std140’ but in the CPU side pretending that I don’t know the alignment will turn out to be 16 bytes).

In short, tryiing to use an UBO seems to cause a lot of problems. So maybe an SSBO? How much slower it is? Are there any other options?

Alfonse_Reinheart · August 22, 2020, 2:36am

You can always emulate an array of integers with an array of uvec4s. You simply have to transform the index into two indices: one for the uvec4 array, and one for the component within the array. This is pretty trivial:

int component_ix = index % 4;
int array_ix = index / 4;

An array of uvec4s in GLSL will have the same layout as a C++ array of GLuint four times the size of the GLSL equivalent.

Utumno · August 22, 2020, 10:08am

Uh, so the following

layout (shared) uniform vec4Uniforms
{
vec4 vEffectUni[MAX_NUM_EFFECT];
}

( or ‘uvec4’ ) is guaranteed to be tighly packed, i.e. without any gaps? That would solve all the problems.

EDIT: Indeed, now I actually read section 7.6.2.2 ’ Standard Uniform Block Layout’ of the OpenGL ES spec and points 1,2,3,4 imply that an array of {u}vec4s would be tightly packed even in case of a ‘layout (std140)’ . I need to try this!

Alfonse_Reinheart · August 22, 2020, 1:26pm

No, nothing is guaranteed because you used shared. Only std140 has explicit layout guarantees.

Basically, you should pretend that shared and packed don’t exist.

Dark_Photon · August 23, 2020, 3:02pm

Related: It sure would be useful if we had a GL extension to add std430 packing for UBOs in GLSL (GL or GL-ES).

This already exists in GLSL (Vulkan) in Vulkan 1.2 or with this extension:

VK_KHR_uniform_buffer_standard_layout