Software weighting as fast as Hardware

Now that I understand matrices so well I thought I’d try the weighting extension out again. I figured it out, got it to work, and it’s actually slightly slower than my software method.

Why? It’s suppose to be faster. If not now, will it in the future? I’m running a Geforce2 so maybe the newer cards will?

I could rig my program up to switch between software and hardware if enough of you are interested.

Let me know.

Aaron Taylor

New benchmark:

Software = 336 fps
Hardware = 319 fps

I’m disappointed .

Let’s see it, or at least a description of what your doing.


It looks to me you don’t benchmark a real life situation, unless you target your application at about 300 fps. In a real life situation, you may have lots of other things going on, and you will benefit from not doing it in software, because the GPU will offload the CPU (assuming the vertex weighting is actually performed in hardware ), leaving more processor time for other things.

[This message has been edited by Bob (edited 10-27-2002).]

Bob, I think I understand what you mean. The only thing the CPU is doing in my program is the deformation. So what you’re saying is if I had a lot of physics going on the hardware weighting would alleviate any burden off the CPU which would cause the app to run faster.

Even so, the fact still does remain that the CPU is doing it faster than the GPU. Go figure.

Here’s my code V-man:

if(s3dVertexWeightPointerEXT && s3dCaps[S3D_WEIGHTING] == true)
	//	deform mesh using hardware


	s3dVertexWeightPointerEXT(1, GL_FLOAT, 0, Objects[i].Weights);
	//push out to Hook


	glGetFloatv(GL_MODELVIEW_MATRIX, DeformMat);

	//push out to Hook

	//custom transform


	Objects[i].Draw(Objects[i].Meshes->Vertices, Objects[i].Meshes->Normals, s3dTextureHub.TextureObjects, s3dCaps);

else if(s3dCaps[S3D_WEIGHTING] == true)

	//	deform mesh using software

	//Create an inverse of the Hook
	s3dMatCopy16f(HookInverse, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

	//bring the RelativeMat home relative to the Hook
	s3dMatCopy16f(RelativeMat, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

	s3dMatMultiply16x16f(RelativeMat, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Origin);

	//start the deformation with the bone frame matrix
	s3dMatCopy16f(DeformMat, skeleton->Bones[Objects[i].BoneAssignment].Frames[CurFrame].Hook);

	//transform by the RelativeMat
	s3dMatMultiply16x16f(DeformMat, RelativeMat);
	//apply custom transformation
	s3dMatMultiply16x16f(DeformMat, skeleton->Bones[Objects[i].BoneAssignment].Custom);

	//bring the matrix back home to it's new location
	s3dMatMultiply16x16f(DeformMat, HookInverse);
	//deform mesh
	s3dDeformMesh(VertexBuffer, NormalBuffer, &Objects[i], DeformMat);

	//	reorientate object into Hook 3 space
	//push out to Hook


	Objects[i].Draw(VertexBuffer, NormalBuffer, s3dTextureHub.TextureObjects, s3dCaps);

[This message has been edited by WhatEver (edited 10-27-2002).]

Heh, by the time it gets to that code there was a lot of tabs…I’ll fix it…

Ok, it makes sense to me now why the software is faster…'cause my CPU is faster than my GPU.

The vertex weighting extension is basically useless. On the GF2, it seems that your vertex processing throughput plummets in half (which suggests some implementation details of the hardware). Also, each triangle can only be touched by two matrices on a two-matrix card like that, which means you can’t realisticly do anything “soft” like a human. Elbows, armpits, neck, etc will all look really poor.

If you want to do matrix palette skinning in hardware using OpenGL, you should look to GL_ARB_vertex_program, or possibly the vendor-specific extensions found previous to that (NV_vertex_program, EXT_vertex_shader or whatnot).

For our product (which targets GF2) we coded up skinning in SSE and optimized it as much as we could (alignment, interleave, writing to AGP, etc) and it runs decently fast. About 40,000 tris per frame soft-skinned at 30 fps takes about 10% of a Pentium3/800, if I recall the numbers right. This is split in about 100 separate chunks (each with its own material).

im sure u all know this vertex programs are done in software on a gf2 (hardware gf3+),
the weighting extension is like a specialised vertex program.
also ive had major problems with the vertex weighting extension in the past on my tnt2 (buggy drivers).
so personally i would ditch weighting + would go with a software version + only use vertex_program if supported

I agree with jwatte here – this was a feature that didn’t quite pan out. Furthermore, since we don’t support ARB_vertex_blend, and ATI doesn’t support EXT_vertex_weighting, it’s hard to use it portably. (From our point of view, it’s not worth the effort to implement ARB_vertex_blend – we see it as a dead end obsoleted by vertex programs.)

We’re thinking of phasing out support for this extension in a future driver – preferably for all chips, not just the new ones. Does anyone here use the extension in any important application?

  • Matt

Your problem is simple, your CPU is too fast. Purchase a slower CPU or underclock your existing CPU until you get the desired results.

Another way of looking at this is that your CPU is available for other tasks while the GPU performs the transform.

actually a slower cpu wont make a difference cause i have the feeling that in his case (gf2) BOTH are done on the cpu.

Originally posted by mcraighead:
We’re thinking of phasing out support for this extension in a future driver – preferably for all chips, not just the new ones. Does anyone here use the extension in any important application?

  • Matt[/b]

just do it.i never used this extension in my applcations and i think there is no serious application out there using this extension. and if there is one, i am sure it has also a cpu-only version for 3d-cards which does not support this extension…

YES. We use it in military applications for skinning character bone systems. From my point of view as well as my experience with high end gfx HW the vertex weight extension is vital !!!

That people don’t like it is in my oppinion a bad understanding as well as bad usage of the extension. Peoples talk about to few matrixes and control in the EXT implementation is wrong. You can build a very good character animation with only one weight matrix + a good bone system.

People that say it goes as fast in CPU is wrong. I bet they don’t calculate the normals. If we use the software version we need to recalc the length of each transformed normal to be able to use it in the weight formula.

Of course it can be implemented in vertex programs but then you will break the “standard” path without VP software.

THE BEST solution would be for ATI to implement the EXT version on both MAC and PC and then you would have the platform independancy.

Matt. Please don’t take it away. Then we must change HW !!

Another issue is of course the simplicity of OpenGL. To use the EXT version you need only a few lines of code. To use the VP you need a lot more code.

Our experience is that a good skinning system will run faster in software than in hardware using this extension. Trying to use this extension will lead to an unacceptable number of matrix state changes, which will in turn cripple your T&L performance – your batches will be too small.

I’m not saying that this extension can’t be useful; simply that we don’t find it interesting ourselves, and in fact we try to discourage developers from using it.

  • Matt

My knowlege of skinning is pretty limited in that I don’t know how they do some of it, but the basic technique of interpolating between vertex at rest and rotated vertex looks pretty realistic.

Here’s the program I spoke of:

If it doesn’t run create a shortcut for the exe…dunno why but some people have to do that. Run the exe then pull the console down and scroll up using the page up key. If you see a line that says “GL_EXT_vertex_weighting”, then the hardware method was found. It’s the same basic technique that I do with software.


The problem with the vertex weighting extension with only two bones is that that means that you only get two bones for the entire TRIANGLE. If I had two bones per vert, I’d be reasonably happy. In fact, that’s what our software transform system uses.

And, yes, the software transform system is faster than the hardware implementation on the GF2, while giving the artists SUBSTANTIALLY better control over vertex weighting, because they get per-vertex, not per-triangle, bone control. Oh, and we do the normals too, although we let the card normalize with GL_NORMALIZE.

We don’t do scale/stretch in our animations, which means that we can transform vertices and normals with the same matrices (using a w of 1 for vertices and 0 for normals).

Last, we can overlap our skinning with OpenGL drawing thanks to the various asynchronous data submission extensions available.

If I had two bones per vert, I’d be reasonably happy.

Isn’t this the same thing, jwatte? This is the GL_EXT_vertex_weighting extension Overview.

The intent of this extension is to provide a means for blending
geometry based on two slightly differing modelview matrices.
The blending is based on a vertex weighting that can change on a
per-vertex basis.  This provides a primitive form of skinning.
A second modelview matrix transform is introduced.  When vertex
weighting is enabled, the incoming vertex object coordinates are
transformed by both the primary and secondary modelview matrices;
likewise, the incoming normal coordinates are transformed by the
inverses of both the primary and secondary modelview matrices.
The resulting two position coordinates and two normal coordinates
are blended based on the per-vertex vertex weight and then combined
by addition.  The transformed, weighted, and combined vertex position
and normal are then used by OpenGL as the eye-space position and
normal for lighting, texture coordinate, generation, clipping,
and further vertex transformation.

Measure the EXT extension versus software.

In a typical situation I get 1000 FPS using the EXT version on a GForce 4 and 750 using SW version.

In my calcs I have a model matrix M and a Weight matrix W so I get the weight

Vout = M * w * V0 + (1-w) * W * V0

where V0 is one of my vertices and w is the weight.

To use a single transform system I need to create the transform matrix P = Inv(M)*W

so I can transform the vertice by

Vsw = w * V0 + (1-w) * P * V0

where Vsw is the transformed software vertice

this way the Vout = M * Vsw is equal to the first equation.

This can be done fast with SIMD ops but you still need to create an area to hold the transformed vertices. You still need to keep this area on a per rendering thread basis and if you share geometry with different weight matrixes you need to recalc the Vsw several times per frame.

Normals. A bit trickier.

Basically i need to exchange the P with S = transpose(inv§). This matrix has a det = 1 but still the transformed value S * N0 is not unit len wich means that
Nsw=wN0 + (1-w) * unit((SNo))

This can not be accomplished with GL_NORMALIZE

… so you matt say that all extensions that can be replaced by vertex programs and fragment programs shall be obsolete or ?

If you take the EXT weight version out you should also take away all other extensions that can be replaced by Fragment programs and vertex programs to keep OpenGL clean… and then you only have some data transfer extensions + fragment + vertex progs left…