Processing cost of glUniformMatrix*()

In programing my vertex shader, I’ve come to a question:

GPU’s are far faster than CPU’s, so they’ll be able to do matrix multiplications faster than my own program will. However, the more matrixes I send into the GPU, the more calls I’ll have to make to glUniformMatrix*().
So, in wanting to do 3D rotations, is it better to:

  • send four uniforms to the shader (rotation in X, rotation in Y, rotation in Z, translation) and multiply them in there, having to possibly make four calls to glUniformMatrix*()
    OR
  • do the multiplications in the CPU and then only send a single glUniformMatrix*() with the resulting matrix?

Generally keep things balanced between cpu and gpu.
Assume you have 2000 drawcalls per frame, 1000 vertices per drawcall on average. If you do the 4 matrix-matrix multiplies on cpu, you’ll be making the cpu do 20004 matrix-matrix multiplies per frame. If you do the 4 multiplies on the gpu, in a vtx shader, you’ll be making the gpu do 200010004 multiplies per frame :slight_smile: .
You could use transform-feedback and such to make the gpu do only 2000
4 multiplies, output to a UBO and read from that UBO in vtx-shaders. But you may end-up under-utilizing the cpu, and putting too much (or inconsistent) work on the gpu.

Generally, don’t overthink it, except in the part of your engine where you draw lots of grass/trees/bushes/etc, where instancing will be required.

Btw, for lots of the static geometry, you won’t be using “rot x/y/z, position”, but a 4x3 matrix per object (its transformation in worldspace). No multiplies necessary, generally.