Would it be faster to directly perform these calculations on each point on the cube? (Before we call glBegin and glEnd, of course ) Most transformation matrices seem to be sparse, so aren’t we wasting time by multiplying by and adding zero to things?

Would it be faster if you could do most of your calculations this way, and only used the matrix multiplication sparingly?

Sorry in advance if this seems like a dumb question, I really don’t know how hardware accelerated Matrix multiplication works

First off, intelligent drivers don’t apply matrices which don’t affect the vertices (ie Identity Matrices). Then, newer T&L cards perform glRotate etc. plus the matrix multiplications in hardware, so it will be really hard to beat them, and if you would really beat it, you would still waste the power of the graphics card and would have less time to do other stuff like AI.

i agree with Michael, the newer cards are designed to do those matrices calculations really fast. I can’t imagine that calculating each point would be faster.

>>I really don’t know how hardware accelerated Matrix multiplication works

I don’t want to be rude or anything, but with a statement like that, I really don’t think you are deep enough into OpenGL, or 3D over all, to try to optimize transformations. Transformations is one of the last things you should try to optimize. As Michael said, beating the HWT&L, which is supposed to be free (to a certain degree), is going to be REALLY difficult. I am almost 100% sure, that doing what you mention, will be slower, even though it’s simpler and less maths.

I can imagine that the HWT&L CAN be beaten, if you are running two separate threads on a multiprocessor system, where one processor is only doing transformations. But still, I’m not sure about it.

I tried this some months ago…
even with sine and cosine-tables my (simple) software-renderer was slower than OpenGL.
Like you said: optimizing HWT&L by software isn’t really possible

Guys, no critic, but I think you focus too much on how to transform fast enough (though it is important). There may be some situations where some specific transformations can be done faster in the cpu than in the gpu, no doubt. But if you state that your routine transforms n vertices in m seconds and the gpu does the same in m*1.5 seconds, and you use it in cpu, the gpu is idle for m seconds where the cpu is busy. If there is any chance to get the cpu rid of some work by doing it somewhere else, do it. GPU will be fast enough, and the cpu can focus on game play.