hardware t&l

hardware t&l looks like a great way to speed up rendering greatly but i’ve never looked at it in detail.
so my question is, if i’ve got some traingles to draw how do i tell my gfx board (geforce3) to do all the calculations instead of the CPU? is there any easy to understand tutorial on that?


If you have hardware T&L, you get it automatically.

Yes, very easy…do nothing special. Under openGL, hardware accelleration is automatic if your card supports it. Of course there are some things you can do to help it such as optimizing for the vertex cache, and avoiding overly complex CPU-based LOD schemes in favor of systems take advantage of the GPU’s power.

it is automatic? very cool…
BUT afaik win2k ships with openGL 1.1, does it handle t&l or is that actually decided upon runtime when the nvidia driver kicks in?

still it surprises me, because my engine is only about 1/3 faster on an 1,4G AMD, geforce3 than on a 866M P3, crappy ATI-cheapboard. i thought the difference should be somewhat more stunning, especially with all those nvdia demos floating around pushing 100.000 polys 25times a second…

As long as you have the hardware’s drivers installed you will get the hardware T&L. If you are only using the Microsoft GDI renderer (slow as dirt) you wont get ANY hardware acceleration. The reason you arent getting the same performance is because of optimization. It can be very difficult to get max performance. There can be bottlenecks everywhere in the pipeline. Vertex cache problems, vertex memory alignment issues, excessive renderstate changes, unnecessary operations, pipeline stalls, etc.

i think the bottleneck in my engine is the massive amount of recursion and triangle splitting (ROAM-kinda engine)

thx anyway

Yes, that is probably it. Roam is absolutely horrible for hardware T&L.

Yes, the Geforce cards use hardware t&l automatically. But with the VAR extension you can speed your rendering up dramatically, if you use it the right way. In my case partial around 3 times faster than without. You should download and take a look at learning_var in the NVidia SDK. You can get it at www.nvidia.com. It shows an example how to make use of the AGP memory and fences. Well… on my PC AGP is not much faster than system memory, but if I use the video memory… well, nice little boost .