I have an interesting dilemma to solve. I work on a sophisticated animation system. Our characters have about 112 bones in their skeletons. Each model has a large number of meshes, and each mesh has its own shader (which in our case uses CG). Our shaders are generated procedurally dependent on the material attributes of each mesh.
We skin on the GPU, which in theory is nice and fast. But we have big scalability issues. The problem is that we need to upload the skinning data (3x4 matrices) for each mesh. Our rendering engine sorts by shader (so that we only incur a single setup / teardown per batch of meshes that share a shader), which reorders meshes; and that - combined with the fact that (a) there’s probably too much skinning data for CG to store anyway and (b) you can’t share data between two shaders means that we are continually pushing matrix buffers up to the GPU. Profiling has shown that this quickly becomes the bottleneck as characters are added to our scene.
Ideally - naively - we would upload all the skinning data at once, and then use integer index buffers to select the relevant matrices for each submesh. But that won’t work because the submeshes can, and generally do, have different shaders. So I’m kind of stumped as to how to proceed here. Help would be appreciated.