OpenGL ES 3.0.
I am currently close to exceeding the guaranteed number of uniforms (AFAIK the spec guarantees 1024) in my vertex shader. So far I’ve been sending individual uniforms and uniform arrays. Now, in an effort to fix all the problems with ‘max number of uniforms exceeded’ once and for all, I am trying to convert all of that to either an Uniform Buffer Object or Shader Storage Buffer Object.
Out of the two, I understand an UBO is recommended since it is supposedly faster than an SSBO.
So I’ve created a branch in my code where I am trying to move all uniforms in my vertex shaders to UBOs.
Big problem here is that there seems to be no ‘good’ layout for the UBO. I can choose between ‘std140’ (which aligns everything to 16 bytes) ‘shared’ or ‘packed’ (which as I understand do not guarantee anything, but in practice - in my tests on a few devices - seem to align stuff to 4 bytes).
The basic types of the uniforms which will become parts of the UBO are arrays of ints and arrays of vec4s. So if it were really true that ‘packed’ aligns such stuff to 4 bytes, than everything on CPU-side would be tightly packed, which means that after the change to an UBO on the CPU side I’d still be writing to exactly the same pieces of memory.
Otherwise, I’d need to do a very complicated dance: first guess what the alignment might be, allocate my CPU-side buffers, fill them up with data, then eventually shaders get compiled, I get to figure out the alignment, it turns out not to be what I expect, I need to reallocate all CPU-side memory, and to make matters worse, during rendering I now need to write to my CPU-sided buffers with ‘gaps’, i.e. write one float, skip how many the alignment requires, write another one, etc.
I’ve already tried this and whole this dance causes a revolution in the CPU-sided code, measurably slows it down (I’ve tried testing by setting the layout in the shader to ‘std140’ but in the CPU side pretending that I don’t know the alignment will turn out to be 16 bytes).
In short, tryiing to use an UBO seems to cause a lot of problems. So maybe an SSBO? How much slower it is? Are there any other options?