Vertex cache in GeForceFX


I’m having troubles with moving my code from ATI Radeon 9000 card to my new GeForceFX 5600. I was unable to determine the size of vertex cache, my simple test shows that there’s no post-TNL caching at all, but that’s an absurd, isn’t it? As a result, vertex processing speed is amazingly slow, I can push only 11 mtri/sec against 30 mtri on my old Radeon. Does anyone have similar difficulties?

How exactly do you feed the vertices to the gfx card?
I´m assuming you use indexed vertices with glDrawRangeElements maybe even with VBO. In that case everything should work fine.

Also you have to be sure, that you are transform-limited, else your test won´t make any sense.


I’m using VBO with single DIP call. And it’s definitely not fillrate-limited.

So you think the hardware is OK? I was thinking maybe NVIDIA removed hardware TCL from GeForce compeletely since 5600 is not high-end videocard.

[This message has been edited by Galstaff (edited 01-14-2004).]

A Geforce FX 5600 is a high-end card if i am not totally wrong.

Removing the post t&l cache would be a very very stupid thing, therefore i am quite sure nVidia would never do that.

But what´s a DIP call? Don´t know what you mean with that “shortcut” (?).


Hardware TNL is not gone from any GeForce products. It’s ATI that did that with the Radeon 7000 and 9100 IGP…

Maybe you’re not vertex transfer bound, but bound somewhere else? Maybe you’re off the fast path for some reason? For example, Index buffers should be in system memory for GeForce cards.

Try running VTune (if you’re on Intel CPU, else AMD’s profiler) and see where the hold-up is. Perhaps you’re spending time copying or converting data somewhere in the driver?

Ok guys, thanks for help. So your guess is that my code is inadequate for GeForce.

Jan, DIP is abbreviation of DrawIndexedPrimitive.

JWatte, VBO is not that low-level as VAR, so it’s driver’s decision where it’s going to put my IB.


  • size of your vertex format, strange combinations of vertex formats, using non-standard types (shorts, bytes?)
  • alignment issues
  • is the vbo static or dynamic ? In the second case, are you mapping the buffer ? then be sure you don’t read from the mapped memory, only write to it, and only sequentially, without “holes”.
  • size of the vertex buffer, number of vertices/indices rendered per call ? Maybe it’s too high, or too low…?

DIP is Direct3D language specific, not everybody’s familiar with it on this forums


[This message has been edited by Ysaneya (edited 01-15-2004).]

I was unable to determine the size of vertex cache

I’m concerned about this. How is it that you are attempting to determine the size of the post-T&L cache? Perhaps you are using VBO’s in a fashion that worked fast on ATi cards, but does something odd on nVidia cards.

Also, you are using the most current FX drivers, right?

Lastly, I’m not certain, but I’m not sure that nVidia’s VBO implementation is as fast as using VAR yet. I seem to recall reading that somewhere on this forum, but I’m not entirely positive. If it turns out that VBO’s currently aren’t as fast as VAR (which is as fast as the hardware can go), then you can ignore it and wait until nVidia irons out their VBO implementation.

There’s definitely a post T&L cache on the GeForceFX 5600. If you post your code, maybe sombody with be able to help you out.

Originally posted by jwatte:
Index buffers should be in system memory for GeForce cards.

Ugh, I didn’t know that.
Does it make a sense to write a piece of code to understand where to put index buffers? If yes, how this could be done?

If you’re using VertexBufferObject (ARB_VBO) and specify that the buffer is for indices, the driver will put it in the optimal place for each card.

What about VAR2?
You can put indices into AGP with that.