So of course I mean that it takes time to load the Matrices, the weights, the vertices and the indexes to the GPU and back is the increase in speed of the GPU worth the time it takes to transfer this stuff.
Why back? What are you going to do with the data other than render it?
Ultimately it depends upon the complexity of the calculations, contention for the CPU, and contention for the GPU. If the GPU is saturated but the CPU is largely idle, there isn’t much point in moving work from the CPU to the GPU.
As with any CPU-GPU transfer, care must be taken to avoid (or minimise) synchronisation. A CPU-GPU-CPU round trip can add a lot of latency, which may be an issue.
In some cases, moving animation pose generation from the CPU to the GPU can result in large performance gains. I’ve done this and gotten big benefits batching hundreds of animated characters together in a single draw call. But it depends on your requirements as to whether this is a win or not.
Just to add to what GClements has already given you to consider…
- From your wording, I’m inferring that this is skeletal animation, yes?
(i.e. skin meshes in bind pose, joint transform palette for each animation track, etc.)
- How many animated characters do you plan to render at once?
- Per frame, will animated characters be posed by only 1 modeler-generated animation track, or will you be supporting animation track blending (e.g. for cross-fading between animations tracks)?
- Besides rendering, do you have CPU-side processing that requires the posed mesh data (e.g. collision or intersection testing)?
If you have a small number of animated characters to render per frame and/or a small number of joints/bones in your joint skeletons/animation tracks, then there’s obviously less gain to be had here by moving the joint palette (“
Matrices”) generation from the CPU to the GPU. But the more characters and/or the more joints (aka bones), the more the potential gain. It’s fairly easy to store joint transforms on the GPU (as dual quaternions, quaterion+translation, or matrices), sample them, and blend them in the shader (e.g. for smooth skinning; i.e. multiple joint influences and pose-gen between keyframes). With this, you don’t need to pregenerate “posed mesh” transform palettes for the current frame for each active character/animation. This only exists on the GPU in the shader on-the-fly.
However, if you need the posed mesh transforms on the CPU for some operation like collision or intersection testing, or if you plan to support animation blending (blending between multiple animation tracks on one character within the same frame; e.g. for cross-fading animations), then that starts to pull the other way (toward joint transform palette gen on the CPU).
Also, if you need to render the animated characters on the GPU multiple times per frame (e.g. for shader or reflection rendering passes in addition to the camera pass), that bears consideration. You can gen the pose transform multiple times, once for each pass. But depending on your target GPUs/HW, it’s worth considering whether posing once to GPU-side buffer objects in GPU temp buffer objects and re-rendering the pre-posed meshes from there might give you a speed-up.
We discussed the posed joint transform generation (“the Matrices”) above. But as to the rest…
The joint indices (vertex attr), joint weights (vertex attr), the rest of the vertex data (vertex attrs), and the indices (for indexed primitive drawing) all get pre-uploaded to the GPU on startup (or scenario load) once and left on the GPU. Typically these are static, and thus there’s no need to regenerate them on-the-fly, re-upload them to the GPU, nor pull them back to the GPU.
So unless you’re thinking about somehting special, we’re really we’re just talking about generation and use of the posed joint transforms. In particular, where that happens and what form they take.