Does the GeForce DDR accelerate fully general texture matrix setups ?

Using VertexArrayRange on GeForce DDR ( actually VAR2 ). For 12000+ vertices I get 25fps using automatic

texture coordinate generation with a texture matrix such as the one below.

Using glGetFloatv( GL_TEXTURE_MATRIX, MatrixElements ) the matrix looks something like the following,

0.519930 -0.411105 0.246303 373.963257
-0.019760 -0.790561 -0.262807 378.274323
0.042759 -0.823818 0.568691 765.119751
0.042675 -0.822211 0.567581 771.619019

( the bottom row is the one that is usually 0 0 0 1 ).

If the elements 0.042675 and -0.822211 are zero then the program runs at full speed ( approx. 120 fps but the

results are ofcause incorrect ). For non-zero elements this drops to 25 fps. It appears as if the driver is

forced to a software path and therefore tranforms the coords from AGP. I did test the program using standard GL

arrays, however in that case there’s a performance drop due to geometry transfers.

I moved the projective texture coordinate generation to the CPU, copying 4D texture coords to AGP each frame

and the program runs at full speed again.

I’ve tested the program in Win98 and Win2000 using WHQL drivers 12.41 and Beta 12.60, so the only conclusion I

can draw is that the GeForce DDR does not accelerate fully general texture matrices (?). I use texgen for

everything else and this is very fast on the GeForce.


  • Paul

It’s good to know that nVidia is cutting corners on their hardware. I guess you ran into one of them.

The same thing might happen on Modelveiw matrices too.


Could you send a simple test case that illustrates the problem?

Thanks -


Basicly I’m just projecting a texture using the techniques you describe in your paper. In the CPU assisted

version I calculate the 4D texture coordinates resulting from the matrix,

Mat = T * S * P * M

where T is the same as glTranslatef( 0.5f, 0.5f, 0.0f ) and S is glScalef( 0.5f, -0.5f, 1.0f ). P and M are the

projection and modelview matrices respectively.

The P matrix I create like this,

float X = 2.0f * ZNear / ( XMax - XMin );
float Y = 2.0f * ZNear / ( YMax - YMin );
float C = ( ZFar + ZNear ) / ( ZFar - ZNear );
float D = -( 2.0f * ZFar * ZNear ) / ( ZFar - ZNear );

P.m11 = X; P.m12 = 0; P.m13 = 0; P.m14 = 0;
P.m21 = 0; P.m22 = Y; P.m23 = 0; P.m24 = 0;
P.m31 = 0; P.m32 = 0; P.m33 = C; P.m34 = D;
P.m41 = 0; P.m42 = 0; P.m43 = 1; P.m44 = 0;

and the M matrix,

M.m11 = Rx.x; M.m12 = Rx.y; M.m13 = Rx.z; M.m14 = Rx.Dot( -Origin );
M.m21 = Ry.x; M.m22 = Ry.y; M.m23 = Ry.z; M.m24 = Ry.Dot( -Origin );
M.m31 = Rz.x; M.m32 = Rz.y; M.m33 = Rz.z; M.m34 = Rz.Dot( -Origin );
M.m41 = 0.0f; M.m42 = 0.0f; M.m43 = 0.0f; M.m44 = 1.0f;

So these are just like GL’s. The only difference is that I have the positive Z-Axis pointing into the screen (

opposite GL ).

The elements I was refering to in my previous post were m41 and m42 of Mat. Just setting these to zero will

increase the fps from 25 to 120. I know there’s a performance hit from using non-identity texture matrices but

this is the first time I’ve experienced such a huge drop. Both the CPU version and the GPU version look the

same, but for once the CPU version runs 4x faster.

  • Paul

Originally posted by Korval:
The same thing might happen on Modelveiw matrices too.

I use the same code for texture projection and as I do for the camera. This hasn’t been a problem. I actually tried projecting a texture using the exact same matrices that I use for the camera and it still runs slow.

I hope I’m mistaken and that the GeForce1 DDR really can do general 4x4 matrix multiplications on texture coords. Maybe Cass or mcraighead can confirm this ?

  • Paul

Just a small update…

A friend has just tested the program on a GeForce3 and it runs at 160fps ( where the GF1 ran at 25fps using TexGen/Texture matrices ).

  • Paul

I just ran that projective texture demo thingy on the NVEffectsBrowser and I was getting almost 1200fps. But when i started to move the projector around it dropped to about 12 to 30fps. Im using a GeForce DDR 32mb. Why does the framerate drop so slow when the projector is moved around?


Specifically which demo are you talking about? Is it the one that has shadows?



GeForce 256 T&L has a hardware issue with projective texture matrices that forces a software fallback. As mentioned in the vertex array range whitepaper, any software fallback is particularly painful when using VAR because the driver has to read from uncached memory.

There is a hardware accelerated application work-around if you’re using object linear or eye linear texgen. You can simply bake the texture matrix directly into the texgen planes. This is actually more efficient as well.

There is a related issue that comes up when using a non-identity texture matrix with cube map texture coordinates (s,t,r) on GeForce 256. See my post on the “Cube mapping disabling VAR…” thread in this forum for details.

This issue exists ONLY for GeForce 256 (DDR & SDR) and Quadro hardware. No other GeForce class hardware is affected. Specifically, no variety of GeForce2, GeForce3, Quadro2 or Quadro DCC is affected.

                                                                                        Thanks -

Thanks, Cass. Moving everthing to the texgen planes and using an identity texture matrix worked. Using VAR is a great way to find out if you’re running Hw accelerated or not.

If it’s not a secret, I would like to know what specifically makes the driver choose the software path.

  • Paul

[This message has been edited by PH (edited 06-15-2001).]


As you’ve already hinted, the issue is with non-zero entries in the 3rd row of the texture matrix. There are really only 3 rows in the texture matrix for GeForce{1|2} hardware since it doesn’t support 3D textures. For projective texturing the rows are (s,t,q) and for cube mapping they’re (s,t,r), but it’s always the 3rd one.

Thanks -

[q]There is a hardware accelerated application work-around if you’re using object linear or eye linear texgen. You can simply bake the texture matrix directly into the texgen planes. This is actually more efficient as well.[/q]

Probably it is possible to do such workaround on driver level? It will be fixed in next detonators?