I’ve written a compute shader that generates mesh data. I have two output buffers that I need to fill with the point and normal data I’ve computed. In the current version of my shader, I’m declaring them as
layout(r32f, set = 0, binding = 4) writeonly restrict uniform image1D result_points;
layout(r32f, set = 0, binding = 5) writeonly restrict uniform image1D result_normals;
and writing to them as
imageStore(result_points, write_pos + i * 3, local_point_pos.xxxx);
imageStore(result_points, write_pos + i * 3 + 1, local_point_pos.yyyy);
imageStore(result_points, write_pos + i * 3 + 2, local_point_pos.zzzz);
imageStore(result_normals, write_pos + i * 3, grad.xxxx);
imageStore(result_normals, write_pos + i * 3 + 1, grad.yyyy);
imageStore(result_normals, write_pos + i * 3 + 2, grad.zzzz);
imageStore only accepts vec4 as a datatype so I’m passing redundant data to it.
I noticed during testing that if I switched the buffer type, this shader runs much faster:
layout(rgba32f, set = 0, binding = 4) writeonly restrict uniform image1D result_points;
layout(rgba32f, set = 0, binding = 5) writeonly restrict uniform image1D result_normals;
...
imageStore(result_points, write_pos + i, local_point_pos);
imageStore(result_normals, write_pos + i, grad);
Probably because of the fewer imageStore calls. Unfortunately, this also means I’m returning 4 floats per point instead of 3 and I need to return a buffer of vec3 since that is what the next stage of my pipeline requires.
Is there an efficient way to write vec3s to my buffer, or should I write vec4s and then postprocess the buffer on the CPU?