I have to fit a normal transformation matrix into a vec4 of a GLSL shader. The normal vector transformation matrix is defined as the transpose of the inverse of the 3x3 rotational component of the normalized model matrix.

Are you aware of an equivalent form that only takes 4 components and can be turned back into a matrix inside the shader without much effort (no sin/cos etc.) ?

That seems unlikely in the general case. If the initial matrix you performed the inverse-transpose on consisted only of non-uniform scale and rotations (ie: no sheering or other such things), then you can use the same rotation without the scaling to produce the correct direction. An orientation can be a quaternion.

However, this requires having the orientation as a separate value from the other transformations used to construct a matrix. If you’re doing skinning, or any hierarchical transformations, that’s not something you can easily compute. For a general matrix, you would need to perform singular-value decomposition on the matrix to generate its orientation.

In fact I need it for Vulkan because I have to fit some stuff into a 128 byte limit of push constants but I posted it here because in theory it has to work on OpenGL too.

You can’t fit a 3x3 matrix into 4 components. Unless there is some constraint upon the transforms used to generate the matrix, then the 3x3 matrix (either the original or its inverse-transpose) could be literally any 9 values.

Any 3x3 matrix can be generated as the composition of a rotation, a non-uniform scaling, and another rotation. This is what singular value decomposition gives you.

If it was purely a rotation, then 3 components would suffice (although you’d need trig to turn those back into a matrix).

A different idea:
I realized that some of the push constants are unique per vertex buffer.
Would it be possible to add the constants as vertex attributes using only a small buffer, then replicate this buffer for all vertices using either glVertexAttribDivisor = 1 or the (supposed?) Vulkan equivalent VK_VERTEX_INPUT_RATE_INSTANCE? Would this be faster than UBO?

Question: Is it possible to bind a smaller buffer to a larger glsl uniform block? I have a situation where the buffer might be larger in one frame and smaller in another and I have to define the block size in the shader in advance, right?

Some implementations will throw wobblies if you do this, e.g., MoltenVK. I discovered this because constant specialization for array lengths doesn’t work so the array in my shader had its large default length.

So far on Nvidia binding a smaller buffer with an appropriately reduced range seems to work fine.

Another question:
Say I want to use the same uniform buffer object in both vertex and fragment stage Do I get a performance penalty for not using the same name buffer block name in vertex and fragment?