You can try to move skinning to fragment shader. Idea is to store vertex positions, weight, indices and matrices of all models in textures.

```
Texture 1 with vertices rgb32f:
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAABBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBCCCCCCCCCDDDDD
DDDDDDDDDEEEEEEEEEEEEEEEEE
EEEFFFFFFFFFF...
A, B, C, D, E, F is vertices of animated model A, B, C, D, E, F.
Texture 2 rgba32f weights map. Contains up to 4 weighting coeffs for each vertex:
AAAAAAAAAAAAAAAAAAAAAAAAAA
AAAABBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBCCCCCCCCCDDDDD
DDDDDDDDDEEEEEEEEEEEEEEEEE
EEEFFFFFFFFFF...
Texture 3, rgba16i bone index map: Contains bones indices (or mapping coordinates) per vertex.
Texture 4 rgba32f: Contains bone matrices of all models in current animation state.
Texture can be 1D or 2D (2048 x 1).
For example:
1. tex = 2048 x 1; 4 seq texels contains one 4x4 matrix... max 512 bones
2. tex = 2048 x 4; 4 vertical texels contains one 4x4 matrix. max 2048 bones
3. tex = 4 x 2048; 4 horizontal texels contains one 4x4 matrix. max 2048 bones
```

In fragment shader do something like this:

```
vec4 pos = texture2D(tex1, texcoord);
vec4 weight = texture2D(tex2, texcoord);
vec4 indices = texture2D(tex3, texcoord);
mat4 skinmatrix = GetSkinMatrix(tex4, weight, indices);
vec4 result = skinmatrix * pos;
gl_FragColor = MV * result;
in GetSkinMatrix you need to fetch 4 times from tex4 based on indices to get one matrix. There is a up to 4 influence matrices per vertex which means 16 fetches per vertex.
mat4 GetSkinMatrix(sampler2D m, vec4 weight, vec4 indices)
{
mat4 ret;
for (int i=0; i<4; i++) // loop through indices
{
int bindex = indices[i]; // get bone index
mat4 smat;
// now we need to fetch 4 texels and build matrix. depending how is matrices stored in texture below code needs to be changed.
smat[0] = texture2D(m, vec2(bindex, 0/4)); // assume 2048x4 texture
smat[1] = texture2D(m, vec2(bindex, 1/4));
smat[2] = texture2D(m, vec2(bindex, 2/4));
smat[3] = texture2D(m, vec2(bindex, 3/4));
// now we have one influence matrix. multiply it by proper weight and sum to final (per vertex) skin matrix
ret += smat * weight[i];
}
return ret;
}
```

You must turn off texture filtering and pass correct mapping coordinates.

To use this shader you need to render screen aligned quad in offscreen rgba32f render target. Destination texture will contain all vertices of all models skinned.

Run shader again with prev anim_matrix textures and prev modelview and store result in another offscreen texture.

You can use data from previous frame instead of running shader twice.

Per pixel distance is per vertex 3d motion vector.

You can readback both textures in two PBO’s and rebind PBO’s as VBO’s and use it in vertex shader in real rendering pass.

This is usefull in multipass rendering, when you have to deal with lighting and shadows, and you have to avoid multiple skinning (of same model) per frame. Even with some smart optimisation you can get instancing for free.

Shader work can be extended to offscreen MRT and if you have enough MRTs you can output pos, normal, tangent, binormal and even prev_pos in single (per frame) precalc pass.

Also you can consider OpenCL/CUDA version which can be easier to develop.