Good method of boned animations?

ManOfSpace · May 24, 2004, 3:32am

Hi,

Sorry for the glut(ahem) of threads(4 in 24 hours), this is my first full 3d engine, so some things are still new and strange in my eyes…

Just wondering what (in your opinon) is the best method for boned animations, including vertex weighting/skinning.

With say a g2 as the base limit.

Ideally I’d like my engine to support a more modern approach if this offers more flexibility on say a g4/g5. Although I’d like to avoid using vertex shaders as that would mean any animated entities would be fx-less or I’d have to combine all the possible fx ‘skins’ into a single vertex shader…

Unless it’s somehow possible to pass a vertice through two vert shaders? No, didn’t think so…

I’d prefer to avoid any cpu overheads, but I’m guessing pre-g3 cards have no say in the matter.

What vertex weighting extension would you suggest using?

Also, any other tips, one liners, chat-up lines, feel free please…more info I have when I begin, the less likely I am to destroy us all with some horrific typo.

CrazyButcher · May 24, 2004, 3:37am

I found it useful to do all on CPU yourself, allows optimization for your specific needs and more flexibility, plus it always works independent of extensions or whatnot…
if you let ogl do it, then you dont have the “actual mesh” data at hands, ie for collision or maybe other things where you would need the shape data. I read in here that shadow volumes would need such data as well.

imported_MattS · May 24, 2004, 4:41am

I second the method of doing skinning on the CPU. I decided on this for many reasons including (but not limited to):

availability of vertex data to CPU
no requirement for all shaders to include skinning code
seems less of a win for multi-pass rendering (esp. shadowing)
wanted (well artists wanted, I hate them) morph targets which combine very badly with shaders so vertex data has to travel from system to graphics memory anyway
more platform independent

Depending upon your minimum CPU requirement I have found that a well implemented SIMD algorithm can really fly, especially on the P4.

Matt

CrazyButcher · May 24, 2004, 5:08am

if SIMD is actually working well…
all my experiments (taking intel papers code) resulted in SIMD being slower than normal FPU
no clue why, tested 3dnow and sse… but thats just my own stupidity hehe

but anyway even without simd fpu normally is fast enough for this stuff

ManOfSpace · May 24, 2004, 5:41am

Hmm, my biggest concern with that is it will not only be a heavy cpu load, but more importantly it will really slow down rendering as I’ll then have to upload the mesh into it’s buffer. normal system memory drawarrays is very slow(i.e even with very low poly scenes) in my experiance, even on a p800mhz with a g5.
How did you guys actually go about rendering the new data? Or was it a case of process a vert and then send it’s data using the standard glVertex commands etc? Think that wouldn’t hold up in a real-world game situration.(As opposed to say your standard little running man test with nothing else going on)

Christian_SchA_ler · May 24, 2004, 8:29am

SIMD rocks.
If done right, a static 4-bone skinning on the CPU is 100 cycles / vertex. Don’t bother with branching as is will make your loop slower. However that’s 20 M verts/s on a 2 GHz CPU, but in practice, I got 15 M / s because the AGP couldn’t be bothered to transfer more (32 bytes /vertex).

What is boils down to is that hardware skinning is advantageous only if
(1) you are in desparate need of every CPU cycle or
(2) you can get away with 1 pass rendering and
(3) you GFX card can execute a 50+ instruction vertex shader fast enough to outpace the CPU in skinning. Currently there is none(!). (The ATI 9800 XT barely draws with the CPU at 12 M verts/s).

CrazyButcher · May 24, 2004, 9:11am

yeah I suppose SIMD rocks, but well couldnt get it to work for me

as for drawing the stuff, I use just normal vertex arrays (glDrawArrayElements) and if available streamed VBO. while always the full mesh is calculated in RAM and then with subbufferdata flushed to VBO. but no idea if that is the fastest way to do. likely one could skip the original in RAM and render straight to VBO, but I read that mapping pointers is slower then subbufferdata

Korval · May 24, 2004, 10:57am

If done right, a static 4-bone skinning on the CPU is 100 cycles / vertex.
That’s the best you can do? 100 cycles per vertex? 20M vertices per second on a 2GHz chip?

That’s awful, and sufficient reason alone to use vertex programs and hardware skinning.

Ysaneya · May 24, 2004, 12:12pm

Hum, i have to agree. 20 MTris/sec on a 2Ghz machine is not so bad, but that’s assuming 100% CPU usage just for that task. In “practise” you’ll be likely to generate, fill and render your animated data at a much lower speed. In these times where some games are starting to require PS 2.0 to run at all, i think you can safely assume that your target supports vertex shaders (even in software, like GF1, GF2, GF4 MX & cie ).

Y.

Christian_SchA_ler · May 25, 2004, 1:56am

Originally posted by Korval:
[QUOTE]
That’s awful, and sufficient reason alone to use vertex programs and hardware skinning.
Heh,
it’s close to 100 instructions, adding the 4 matrices up then transforming position and normal. So it’s apx. 1 SIMD instruction per cycle. If you know how to get it faster I’d be interested to hear

imported_MattS · May 25, 2004, 2:32am

The way I have implemented rendering is to mesh deform into a system memory vertex buffer. If VBO is available I then load this buffer data the one time. The system memory buffer is 16 byte aligned for speed and each element is 16 bytes in size. If VBO is not used then simply pass the system memory buffer through the usual route, due to its alignment and size it’s still pretty efficient.

I experimented with mapping the VBO buffer into a pointer and writing directly to this in the mesh deformation routine. I found that this was some 35% slower that my current method even though intuitively there was less to do in this approach. One thing I have noticed is that the returned pointer is not 16 byte aligned which SIMD needs for good speed. If this is the cause of the slow down it is an unfortunate downside to using VBO, VAR was more flexible in this regard.