Correct VK_KHR_cooperative_matrix coopMatLoad usage with shared variables?

Hi,

I would like to know how coopMatLoad from the VK_KHR_cooperative_matrix extension exactly works when loading from shared variables. The definition is:

void coopMatLoad(out coopmat m, float_or_float16_t[] buf, uint element, uint stride, colMajor / rowMajor);

  • m is a cooperative matrix where to store the results of the load operation.
  • buf is a 32-bit or 16-bit buffer (it can be also a shared variable).
  • element is an offset inside the buffer where to load from (in this case, buf).
  • stride is the stride to apply to the loading operation, as perhaps some empty 0s are required per row or column.
  • rowMajor / colMajor to indicate whether to load information from but in a row major or column major fashion (in practice, the values gl_CooperativeMatrixLayoutRowMajor or gl_CooperativeMatrixLayoutColumnMajor are used).

For instance, given a 16 rows x 8 columns matrix and a shared variable of 128 elements with initialized values ready:

    shared float16_t sharedVariable[128];
    coopmat<float16_t, gl_ScopeSubgroup, 16, 8, gl_MatrixUseA> m;

Is the command below correct to load the 128 elements from the shared variable sharedVariable onto the matrix m so no element is missing (being all tightly loaded, with no “0.0” values due to stride)?
coopMatLoad(m, sharedVariable, 0, 8, gl_CooperativeMatrixLayoutRowMajor);

Thanks,
T

Dear @Temp

The key thing to remember with VK_KHR_cooperative_matrix is that the coopMatLoad instructions don’t magically pull data out of shared memory unless you set them up correctly. They expect the data to be laid out in memory in a way the driver understands, and they also need proper synchronization so all threads see the same values.

In practice, that means:

  • Make sure your shared variables are aligned and sized the way the cooperative matrix type requires.
  • Use barriers (OpControlBarrier) so the load happens after the data is written.
  • Treat the load as a normal memory access — if the shared variable isn’t populated yet, you’ll just get garbage.

Once you think of it like “just another memory read, but with matrix semantics,” it becomes easier: write the data into shared memory, synchronize, then call coopMatLoad.

ex:

OpControlBarrier(Workgroup, Workgroup, AcquireRelease);

GL;HC;

~p3nGu1nZz

1 Like