Writing gl_GlobalInvocationID from Compute-Shader into a Shader-Storage-Block

Hi developers,

my goal is to better understand the Compute-Shader architecture which is why I want to write the values of gl_GlobalInvocationID of all Compute-Shader invocations into a Shader-Storage-Block that contains a uvec3 data[] array. However, when I look at the buffer contents on the C++ side I get something unexpected.

First I’ll explain what I expect and why. Then I’ll show a part of the unexpected contents of the buffer.

The following is the complete Compute-Shader code…

#version 450 core

#extension GL_NV_shader_thread_shuffle : require
#extension GL_NV_shader_thread_group : require

layout (local_size_x = 6, local_size_y = 6, local_size_z = 5) in;

layout (std430, binding = 2) buffer TestBuffer
{
    uvec3 data[];
} testBuffer;

uniform int width;
uniform int height;

void main()
{
    uint x = gl_GlobalInvocationID.x;
    uint y = gl_GlobalInvocationID.y;
    uint z = gl_GlobalInvocationID.z;

    uint flatIndex = x + (y * width) + (z * width * height);
    testBuffer.data[flatIndex] = gl_GlobalInvocationID;
}

The important bits should only be the buffer TestBuffer and the contents of the main function, but you never know where a mistake might be.

The variable gl_GlobalInvocationID is of type uvec3. It uniquely identifies a Compute-Shader invocation /Work-Item in the Global-Work-Group with the coordinates (x,y,z). To properly store a 3D-index gl_GlobalInvocationID in the 1D array of the buffer I need to map it to a 1D index. This is done at the flatIndex variable. The following image illustrates how this calculation works(Open the image in a new tab so you can see the labeling)…

Each small cube is a Work-Item, a block of Work-Items is a Local-Work-Group and the entirety of Local-Work-Groups is the Global-Work-Group.
The number on the left of the equals sign is the 1D-index and the number on the right of the equals sign is the 3D-index(gl_GlobalInvocationID).
So my expectations for the buffer contents would be something like the following…

 1D-Index 3D-Index
 0: [0 0 0]
 1: [1 0 0]
 2: [2 0 0]

What I instead get is…

1D-Index 3D-Index
0: [0 0 0]
1: [0 1 0]
2: [0 0 2]
3: [0 0 0]
4: [3 0 0]
5: [0 4 0]
6: [0 0 5]
7: [0 0 0]
8: [6 0 0]
9: [0 7 0]
10: [0 0 8]
11: [0 0 0]
12: [9 0 0]
13: [0 10 0]
14: [0 0 11]
15: [0 0 0]
16: [12 0 0]
17: [0 13 0]
18: [0 0 14]
19: [0 0 0]
20: [15 0 0]

I have the feeling that this has something to do with how I back the buffer and read from it on the C++ side. The following shows the C++ side of things…

Creating the buffer…

context->glGenBuffers(1, &m_testBuffer);

Create the buffers data store and bind it to binding point 2…

context->glBindBuffer(GL_SHADER_STORAGE_BUFFER, m_testBuffer);
context->glBufferData(GL_SHADER_STORAGE_BUFFER, (totalBlockNum*bx*by*bz) * sizeof(uvec3), NULL, GL_DYNAMIC_READ);
context->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, m_testBuffer);

uvec3 is a struct that I defined…

struct uvec3
{
    uint32_t x;
    uint32_t y;
    uint32_t z;
};

Read from the buffer and write it’s contents to a file…

    uvec3 *testBuffer = (uvec3*)context->glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY);
    if (!testBuffer)
    {
        std::cout << "testBuffer could not be mapped" << std::endl;
    }
    else
    {
        std::stringstream ss;

        ss << "Index Value\n";
        for (uint32_t i = 0; i < (totalBlockNum*bx*by*bz); ++i)
        {
            ss << i << ": [" << testBuffer[i].x << " " << testBuffer[i].y << " " << testBuffer[i].z << "]\n";
        }

        std::string str = ss.str();
        Util::writeToFile("F:/Programmierung/Projekte/MCVis/testBuffer.txt", QString::fromStdString(ss.str()));
    }

I am suspicious that interpretation of the buffer as a uvec3 array doesn’t work correctly.
If someone has a hint I would appreciate it.

Happy coding!

uvec3[] has the same layout as uvec4[]. In this regard, there’s no difference between std140 and std430. Either change your struct uvec3 to have an extra element for the padding, or just use uvec4 instead (using uvec3 doesn’t save memory).

The main difference between the formats is that std140 doesn’t allow arrays whose stride isn’t a multiple of 4 words. std430 allows this but it doesn’t allow array elements to straddle the boundary between 4-word blocks. It does allow packed arrays of 1-word and 2-word elements, with each 4-word block containing 4 or 2 elements respectively (3-element vectors have a padding element added).

PS: you can use gl_NumWorkGroups.xy instead of the width and height uniforms.

Thank you so much! Now the buffer contents make more sense.

But I need the number of Work-Items in the x- and y-Direction, not the number of Local-Work-Groups. Or am I understanding something incorrectly?

Oh right; gl_NumWorkGroups.xy * gl_WorkGroupSize.xy will give you the global total.