Rendering hardware and transformations...

I’m needing to fill some gaps in my understanding about rendering hardware and performing vertex transformations in hardware. Any info is highly appreciated.

First, why is it that vertex transformations and triangle rasterization are interleaved? IE, all cards I’ve seen transform and rasterize one triangle at a time before moving on to the next triangle. It seems like there’s always cases where the bottleneck is either rasterization or transformation. Why not seperate these two operations so that some triangles are being transformed while others is being rasterized? I would think that this would keep the hardware the most utilized, rather than always making one unit wait for the other to finish.

Second, is it possible to transform all vertices in a vertex array, store the transformed vertices in video memory, and then simply use them when rendering? In cases where you can’t render your triangles in an order where they’ll share a lot of vertices, it seems like this would be faster.

Thanks in advance for any info =).

Jonathan Dinerstein
jondinerstein@yahoo.com

I know of no hardware that transforms and then rasterizes triangles in sequence. Any modern card can be thought of as having a number of blocks, each of which feeds the next block. Thus, the geometry read block will fetch data over AGP and feed it into the geometry transform block, the transform block will feed transformed verts into triangle set-up, triangle set-up will feed raster positions and interpolation coordinates into the fragment shader block, which will feed pixels into the framebuffer.

Because OpenGL is (should be) a black-box pipeline, where you pour vertices in on one side, and get pixels out on the other side, all of these blocks can work in parallel; while the transform works on vertex N, the fetcher can fetch data for vertex N+1. In essence, the entire OpenGL pipeline is, well, pipelined.

Of course, I have never worked at a chip company, and I haven’t actually ever talked to any graphics hardware designers, so I could be sneezing in the wind entirely. But if you think of the pipeline like this, you’re likely to have a useful concept of how the hardware works, even if implementation details may differ.

>>Second, is it possible to transform all vertices in a vertex array, store the transformed vertices in video memory, and then simply use them when rendering? In cases where you can’t render your triangles in an order where they’ll share a lot of vertices, it seems like this would be faster.<<

If you know in advance which vertices you don’t need, use a glDrawElements call with the indices of the used ones from your big vertex array and the HW T&L will do the rest (and the sharing, if any) for you.