Interpretation of std140 layout

chlee3211 · January 13, 2021, 12:57pm

I refer here(https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_uniform_buffer_object.txt) for the std140 layout.

I have questions of reading about std140 layout rules, especially the phrase “rounded up to vec4”

      (4) If the member is an array of scalars or vectors, the base alignment
          and array stride are set to match the base alignment of a single
          array element, according to rules (1), (2), and (3), and rounded up
          to the base alignment of a vec4. The array may have padding at the
          end; the base offset of the member following the array is rounded up
          to the next multiple of the base alignment.

      (9) If the member is a structure, the base alignment of the structure is
          <N>, where <N> is the largest base alignment value of any of its
          members, and rounded up to the base alignment of a vec4. The
          individual members of this sub-structure are then assigned offsets 
          by applying this set of rules recursively, where the base offset of
          the first member of the sub-structure is equal to the aligned offset
          of the structure. The structure may have padding at the end; the 
          base offset of the member following the sub-structure is rounded up
          to the next multiple of the base alignment of the structure.

The Number 4 rule is saying that the base alignment and array stride are set to match the base alignment of a single array element, according to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. the point that makes me confused is that the rule mentions the base alignment of a single array element according to rules(1), (2), and (3), and then, rounded up to the base alignment of a vec4. When I read the first part of the rule, I think If I use arrays of scalar and vectors, the base alignment will be the value according to rules (1), (2), and (3). For example, If I use float foo[4] in uniform buffer block, then I think its base alignment would be 4. But, The second part of the rule, rounded up to the base alignment of a vec4, changes the whole rule. I mean the second part just makes the base alignment of arrays be the base alignment of vec4, which is 16.

So I wonder why the specification mentions the rules (1), (2), and (3), and then the base alignment will be anyway 16? I think It would be more clear that the base alignment of array is the base alignment of vec4.

My question applies to the number 9 rules in the same way.

If I’m wrong, I think my interpretation is bad. Please correct me.

Dark_Photon · January 14, 2021, 1:26am

chlee3211:

I have questions of reading about std140 layout rules, especially the phrase “rounded up to vec4”
      (4) If the member is an array of scalars or vectors, the base alignment
          and array stride are set to match the base alignment of a single
          array element, according to rules (1), (2), and (3), and rounded up
          to the base alignment of a vec4. The array may have padding at the
          end; the base offset of the member following the array is rounded up
          to the next multiple of the base alignment.
So I wonder why the specification mentions the rules (1), (2), and (3), and then the base alignment will be anyway 16?

Good question. Seems a bit overspecified to me too. I can’t think of a single existing scalar or vector type where just skipping (1), (2), and (3) and using sizeof(vec4) doesn’t give you the same result.

Now if there was a 128-bit scalar type, with corresponding vector types (e.g. u128vec3 / i128vec3, à la ARB_gpu_shader_int64), its base alignment and stride would come out different. But there isn’t.

My guess is this was just concern over Issue (75) in ARB_uniform_buffer_object:

(75) What is the story behind the "std140" packing, and the possible 
     "std140vec4" alternative?

...
          * array elements and structures are aligned/padded to 16-byte boundaries

        The array/structure restriction is because some implementations treat
        uniform buffers as arrays of four-component vectors and may not be able
        to efficiently perform indexed array access with strides less than 16 
        bytes.

        The "std140novec4" alternate packing illustrates an alternate approach
        without required 16-byte alignment that might be exposed as a future
        vendor extension. 
...

chlee3211 · January 14, 2021, 2:41am

Thank you for your answer @Dark_Photon .

In the case of the virtual type you mention, I think the specification should say “if its base alignment is less than vec4, it’s rounded up to the base alignment of vec4”. But, As you said, the future case of type without requiring 16-byte requirement can be the reason why the specification says like that, If they don’t want to modify the specification a lot after introducing the type.

And I didn’t see the issue 75. Thank you for letting me know it. The answer also said the arrays are aligned/padded to vec4.

Dark_Photon · January 14, 2021, 12:59pm

Could be. However, I just cited this to find a counterexample, not because I thought it was the likely explanation. I think it’s just because some HW 12+ years ago required vec4 alignment, at least in some circumstances. NV_parameter_buffer_object appears to have vestiges of this.

Related: What’s also interesting is that Vulkan GLSL has cast off this old limitation, now allowing you to use std430 packing on UBOs in Vulkan 1.2 (or VK_KHR_uniform_buffer_standard_layout).

So modern GPUs support this. And there’s no reason the vendors can’t offer this support for OpenGL GLSL as an extension, if they choose to.

Alfonse_Reinheart · January 14, 2021, 5:49pm

Think in terms of extensions. Lets say that you want to make an extension down the road that allows array alignment/stride to not be that of a vec4. All the extension has to do is delete the “rounded up to the base alignment of a vec4” phrase. It doesn’t have to replace that with a new rule; it just eliminates part of the rule.

Also, consider the case where you add a type whose alignment needs to be 32-byte aligned. Like maybe dvec4. Well, the rule still works, because 32-byte alignment is bigger than 16-byte, so it’s fine.

chlee3211 · January 15, 2021, 3:49pm

I’ve never written any official document of some APIs. So I can understand why it’s specifying like that after thinking about it in terms of API documents/extensions, as you said.

I need to change the way I think when I read the specification.

Thank you

system · July 17, 2021, 3:49pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.