Catalyst 10.12 and std140 uniform blocks

Hello. When compiling a shader with the following uiform block on a HD5750 with Catalyst 10.12…


layout(std140) uniform spot_light_state
{
	vec3 light_color;
	float light_ambient_coefficient;
	vec3 light_direction;
	float cone_cos_angle;
	vec3 light_position;
	float cone_cos_angle_cutoff;	
	float cone_height;
	float light_quadratic_attenuation;
};

…I end up with the following memory layout (the numbers are the byte offset of each uniform):


Compiling program "spot-light":
  Uniform block "spot_light_state":
    0000:  vec3 light_color
    0016: float light_ambient_coefficient
    0032:  vec3 light_direction
    0048: float cone_cos_angle
    0064:  vec3 light_position
    0080: float cone_cos_angle_cutoff
    0084: float cone_height
    0088: float light_quadratic_attenuation

Now, when using the std140 layout surely the float “light_ambient_coefficient” should be combined with the “light_color” vec3 in a single vec4 instead of these uniforms occupying one vec4 each? That’s how D3D10 does it (which I assume std140 is supposed to mirror), and that’s also how previous Catalyst versions did it.

What did you do to get the memory layout report (how do you do it)?

Basically using the glGetActiveUniformBlockiv interface. Here’s the code if you’re interested:


#include <algorithm>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

void print_uniform_block_info(GLuint prog, GLint block_index, std::string const &indent = std::string())
{
	// Fetch uniform block name:
	GLint name_length;
	glGetActiveUniformBlockiv(prog, block_index, GL_UNIFORM_BLOCK_NAME_LENGTH, &name_length);
	std::string block_name(name_length, 0);
	glGetActiveUniformBlockName(prog, block_index, name_length, NULL, &block_name[0]);

	// Fetch info on each active uniform:
	GLint active_uniforms = 0;
	glGetActiveUniformBlockiv(prog, block_index, GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS, &active_uniforms);

	std::vector<GLuint> uniform_indices(active_uniforms, 0);
	glGetActiveUniformBlockiv(prog, block_index, GL_UNIFORM_BLOCK_ACTIVE_UNIFORM_INDICES, reinterpret_cast<GLint*>(&uniform_indices[0]));

	std::vector<GLint> name_lengths(uniform_indices.size(), 0);
	glGetActiveUniformsiv(prog, uniform_indices.size(), &uniform_indices[0], GL_UNIFORM_NAME_LENGTH, &name_lengths[0]);

	std::vector<GLint> offsets(uniform_indices.size(), 0);
	glGetActiveUniformsiv(prog, uniform_indices.size(), &uniform_indices[0], GL_UNIFORM_OFFSET, &offsets[0]);

	std::vector<GLint> types(uniform_indices.size(), 0);
	glGetActiveUniformsiv(prog, uniform_indices.size(), &uniform_indices[0], GL_UNIFORM_TYPE, &types[0]);
	
	std::vector<GLint> sizes(uniform_indices.size(), 0);
	glGetActiveUniformsiv(prog, uniform_indices.size(), &uniform_indices[0], GL_UNIFORM_SIZE, &sizes[0]);

	std::vector<GLint> strides(uniform_indices.size(), 0);
	glGetActiveUniformsiv(prog, uniform_indices.size(), &uniform_indices[0], GL_UNIFORM_ARRAY_STRIDE, &strides[0]);

	// Build a string detailing each uniform in the block:
	std::vector<std::string> uniform_details;
	uniform_details.reserve(uniform_indices.size());
	for(std::size_t i = 0; i < uniform_indices.size(); ++i)
	{
		GLuint const uniform_index = uniform_indices[i];

		std::string name(name_lengths[i], 0);
		glGetActiveUniformName(prog, uniform_index, name_lengths[i], NULL, &name[0]);

		std::ostringstream details;
		details << std::setfill('0') << std::setw(4) << offsets[i] << ": " << std::setfill(' ') << std::setw(5) << gl_type_to_string(types[i]) << " " << name;

		if(sizes[i] > 1)
		{
			details << "[" << sizes[i] << "]";
		}

		details << "
";
		uniform_details.push_back(details.str());
	}

	// Sort uniform detail string alphabetically. (Since the detail strings 
	// start with the uniform's byte offset, this will order the uniforms in 
	// the order they are laid out in memory:
	std::sort(uniform_details.begin(), uniform_details.end());

	// Output details:
	std::cout << indent << "Uniform block \"" << block_name << "\":
";
	for(auto detail = uniform_details.begin(); detail != uniform_details.end(); ++detail)
	{
		std::cout << indent << "  " << *detail;
	}
}

Now, when using the std140 layout surely the float “light_ambient_coefficient” should be combined with the “light_color” vec3 in a single vec4 instead of these uniforms occupying one vec4 each?

No, they should not.

That’s how D3D10 does it (which I assume std140 is supposed to mirror), and that’s also how previous Catalyst versions did it.

I seriously doubt that this is how previous Catalyst versions did it, because AMD loves sticking to exactly what the OpenGL spec says especially when it makes life inconvenient for the user. And no, std140 is not intended to mirror D3D10.

The OpenGL specification is very clear about how std140 is laid out. And those floats are not packed the way you expect them; they are packed exactly as the driver has done so.

Older versions did pack them like I said. I’ve been relying on this since uniform blocks were first introduced and my program only broke with the change to Catalyst 10.12.

Anyway, according to the glspec41.core.20100725 p.88 “1. If the member is a scalar consuming N basic machine units, the base align-ment is N.”. For a float that would mean 4 bytes.

Further it says “3. If the member is a three-component vector with components consuming N basic machine units, the base alignment is 4N.” Which means a vec3 should align to a 16 byte boundary. It does not say this vector should occupy 16 bytes. The size should still be 12 bytes, which means a float following a vec3 should be offset with 12 bytes.

Or am I reading this completely wrong?

You appear to be correct, as far as the spec is concerned. I’ve read the alignment rules before, but always assumed that the alignment worked the way it does in most C/C++ compilers, where if a struct needed a higher alignment, the effective size of the structure was padded out to its alignment. So sizeof(vec3) was always the same as sizeof(vec4).

But as you point out, the spec doesn’t say that.

My guess is that it’s a bug introduced due to ATI’s new 6970 being a 4-way VLIW instead of the 5-way VLIW that ATI’s DX10/11 cards have been beforehand.

It’s a regression according to what you said. We will fix it.

Excellent. Thank you. :slight_smile:

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.