The red book advises in the OpenGL Performance Tips (App. G) to use glRotate* instead of creating my own rotation matrix and using glMultMatrix*. What is the reasoning?

I use quaternions for orientation. Creating the matrix for glMultMatrix involves a bunch of adds and multiplies.

Computing the angle and axis for glRotate requires sqrt() (or sin()) and acos(), and then OpenGL is going to have to do a sin() and cos() on the angle!

The OpenGL driver must treat matrices specified by glMultMatrix*() as general matrices which eliminates some opportunities for optimization. If you don’t find it a performance penalty, I wouldn’t worry about it.

Always use glLoadIdentity() instead of glLoadMatrix*() with an identity matrix, though.

Cass, what kind of scope do those glRotate vs glMultMatrix optimizations have these days? Is it just matrix ops, or vertex transformation as well? I assumed that T&L hardware would be optimized for straight vec4*mat44 multiplies and wouldn’t bother with the special cases.

Based on that assumption, I transform everything to eyespace and just do a glLoadMatrix. Flat eyespace transforms are very handy for frustum culling and for submitting geometry from the same model in multiple shader passes without walking the scenegraph tree all over again, but is it likely to be suboptimal perf-wise?

I feel like adding my 2 cents, even though I don’t really know anything about this

MikeC -

I’m guessing that the driver performs your glRotate or glTranslate calls on the CPU, not the GPU. I bet that matrix isn’t sent to the GPU until you call your first glVertex**() function. My point is that hardware T&L isn’t there to execute these few simple commands, it’s there to take the resulting matrix and apply it to millions of vertices.

Now for some further speculation:

Jambolo -

I’m guessing that in general your quat->Matrix4->glMultMatrix* will be faster than their glRotate* for the reasons you mentioned.

Remember, though, that there are a lot of common non-general cases like rotating around a single principle axis where you’ll get lots of zeros in your matrix. You can save a lot of time (and they probably do) by not doing these multiplications by zero. Even a general rotation matrix, in it’s 4x4 form, will have 6 zeros which can be optimized out of a multiplication routine.

A pretty narrow scope. If you’ve looked at NV_vertex_program, you know what the hardware does.

If it’s more convenient to perform a glMultMatrix*(), then that’s probably what you should do.

For culling strategies, I would probably cull based on bounding boxes and let the hardware handle the transforms.

I would flatten the transform heirarchy where possible. If you’re not getting tons of re-use out of your geometry, paying to compute all those inverses and update the matrices is pretty expensive.

Originally posted by cass:
For culling strategies, I would probably cull based on bounding boxes and let the hardware handle the transforms.

I would flatten the transform heirarchy where possible.

Yes, I do, and eyespace seems as good a choice as any. Unless I’ve been missing something, any culling approach requires transforming either the culling volume or the potential cullee to get them both into the same coordinate space, and since the frustum volume is locked to eyespace anyhow…

The project is a space-sim type thing, where pretty much every scenegraph node can move every frame, so “permanently” flattening the transform hierarchy isn’t an option.

If you’re not getting tons of re-use out of your geometry, paying to compute all those inverses and update the matrices is pretty expensive.

You mean the driver having to calculate inverses? Oh. I asked quite a while back about what priority to give transform when sorting geometry into buckets for rendering, and one of the NV guys (I think it was you) said that the cost of gl*Matrix ops was down in the noise.

On the application side, I’m only using rigid-body matrices for transform, so calculating inverses is trivial - a mat33 transpose and a mat33*vec3 multiply.