Rendering loop under-performs: help?


I’m having trouble getting my rendering loop up to snuff. I’m currently getting only 20,000 triangles per second on my system at home. A GLperf test case with similar GL states indicates my system should be capable of over 100,000 triangles per second.

I profiled my code, and have run through numerous optimization cycles already. I seem to be at a plateau right now, though. My code currently processes a list of the minimal object data which must bre rendered into the scene. It does a small amount of display cache checking, then renders the OpenGL primitives which compose the object.

Timing indicates that over 90% of my function time is spent in the glDrawArrays call. I replaced it with DrawElements with no impact. On a V3 system, replacing the call by manually stepping through the data arrays (with begin/vertex3fv/end) sets significantly improved performance (not enough), but had no effect on nVidia-based test systems. Interleaved arrays do not significantly alter performance (minor performance /drop/).

I’m currently using the vertex, normal, and texture coordinate arrays. All data is tightly packed in seperate single-precision floating point arrays.

I simply have no clue how to get the improvement that GLperf shows me is possible.

I can implement more optimizations based on rejecting data, but they will not boost the triangle output of my function.

Does anyone know what could account for the performance difference? My head aches.

Thanks for any insight you can offer.
– Jeff

[This message has been edited by Thaellin (edited 01-30-2001).]

20,000 triangles per second is not very much. Check out the performance faq from nvidia.

I agree that 20,000 triangles is not much. On the nVidia systems I’m currently only getting approximately 70,000 triangles. Either way - it’s not enough.

Thank you for the FAQ tip: I had not read nVidia’s T&L performance FAQ, and some of the information will prove useful.

The FAQ suggests modifying the function calls I use. To some extent I had already done this. I am currently using what nVidia considers the ‘best’ choice for most of my rendering functions (I’m using the best choice which is appropriate for the data being processed).

The FAQ also provides information on optimal data formats - I had chosen one of the optimal sets. It suggests things to avoid, which I have…

I had already applied the techniques from SGI’s performance optimization tips. I’ve reviewed the nVidia T&L FAQ. There is still something absolutely wrong with what I’ve written, else it would move faster.

Simply put: without optimization tricks, GLperf is showing me significant triangle-per-second numbers which I can not approach.

I know it may be unreasonable to think that someone could point out the problem without looking at either my or GLperf’s code. I really hoped there was some obvious stumbling block which I’ve failed to avoid…

Anyway - cool FAQ. Any other information is still extremely welcome.

– Jeff

I found the problem, and it was as obvious as I thought.

The data being sent to my engine was organized in triangle strips, with a very low triangle count. I was calling glDrawArrays once for every strip, which maxed out application performance at about 9000 calls per second.

By increasing the number of triangles in a strip, I can get the higher count without incurring much additional overhead.

I had simply been working too directly with the code to see the issue.

Thanks for the various tips and hints,
– Jeff