Speeding up repeat rendering

I’d like your opinion on the best way too approach a very specific rendering problem. What I’m doing is rendering a mesh with a polygonal sphere at each vertex point. So there is no mesh being rendered, just a lot of identical spheres. They are untextured but lit with one light. The contents of the scene rarely change, only the view is altered.

At the moment I’m compiling the scene into a display list with standard gluSphere() calls for each sphere. I’ve written a sphere generation algorithm so I can use vertex arrays instead of gluSphere(), but this makes no difference.

My question is, can I expect any difference in rendering speed between gluSphere(), vertex arrays and VBOs if the scene is in a display list? Or will it just be the same display list no matter what?


if you are targeting some faster HW(>gf3 or >9500pro) you could also use depthsprites.
this means: instead a sphere draw only a textured quad, and use a textue shader (gf3/gf4) or a fragment programm to create the correct depthvalue for the current fragment. you could store this depthvalues in a depthtexture, but with fragment programms available it could be easier to compute the value on the fly…(assuming you are using spheres only)

in the NVidia SDK is an example for this method. just take a look at this.
(this should improove your performance a lot…)

here are some (older) links: http://www.ati.com/developer/samples/DepthExplosion.html http://developer.nvidia.com/object/gdc2001_texture_shaders.html

It’s a good idea, and I considered something similar, but this has to work on a TNT and up, and I don’t have the resources to implement this feature just for fast hardware. Thanks for the link anyway.


about your display list question : it will just be the same display list no matter what way you compile the original display list !


Okay then, might using VBOs be faster than using a display list? I know this has been discussed before, but I’m just not clear on it.

This may be a stupid question. Do you use a dl for each sphere or 1 dl and then repeatedly call this one dl. The latter is the best way to do it.

Also, depending on how many spheres, and how complex your shere is you may have reached the maximum performance on a TNT. If the spheres are very small, make sure the number of subdivisions in your sphere are also small.

Also consider a subdivision algorithm, which partitions your spheres into regions of space. Then you can cull each region if it is obstructed or outside the view frustrum.

Again, please forgive the simple suggestions I have provided.

Not at all, everything needs to be considered. I am indeed repeatedly calling the same Sphere DS, after trying both and finding this way was faster.

As far as maximum performance goes, though I’m currently developing this on a TNT2, it will mostly be used by people using Quadro 4s. I can test on a quadro 4 too.

There could be anywhere from 50 (very simple case) to 300,000 (for a proper run) spheres on screen. The aim, therefore, is to reduce the wait between frames rather than achieve real time framerates

Do not judge performance on TNT2 if the taget is a workstation board which is 4 generations newer and optimized for things professional workstation apps do, display lists being one of them.
The TNT2 doesn’t even have a transform and lighting engine to begin with. It might not matter much if you call one display list or thousands on the TNT2, it’ll be CPU bound most of the time.
The Quadros will be factors faster and depending on what you’re doing exactly it might matter if there’s one or many display lists. You should definitely switch to the target system for development.

I’m painfully aware of the inability of this card to run the program. Painfully. What I have to do is, write and compile the code here, then test it on the newer machines. It’s a slow process. This being a TNT2 M64, the program is fill rate limited unless I zoom way out

But relic makes a good point. If you must develop on a TNT2 then there is not other way around it. But, a transform & lighting engine will make a huge difference. Especially, for fewer models w/ fewer vertices. Trying to extrapolate performance based on TNT2 results is a mistake.

Perhaps you could try shamelessly asking for an upgrade on your hardware.

Developer time is money.

But as I just said, I’m not using any results from the TNT2 machine. It’s just for coding on. Though I can and do run it on here from time to time, I only ever pay attention to performance results on the Quadro machines. I’m not trying to extrapolate anything.

Oops misread your post.

Since the spheres are untextured (and presumably mono-colored before lighting effects), you might try rendering only the front-facing half a sphere instead of a whole one. You should be able to build a single half-sphere and rotate each one just like one would rotate sprites in the image plane. It may be acceptable to use the same rotation for all spheres if your FOV is narrow enough, in which case the CPU cost would be no greater than now. Either way your fill and transform times would most likely improve, potentially at the cost of CPU.

The next question is on lighting. Is it a local light or infinite? If infinite and you don’t care about specular, I’d suggest using a VBO instead of a DL, turning off HW lighting, computing the diffuse component for your single half-sphere on the CPU (only when the light moves) and uploading this to be used everywhere. Turning off hardware lighting may improve things when you’re not fill limited, especially on a TNT-class HW.

If you want to add specular to that, you might consider using an additive specular texture (a gausian hot-spot), though I’d be wary of this if you’re consistently fill-limited. The rotation of this texture can be computed once via texture coords if you don’t care about perfect realism (or have an infinitely far light) or use the texture matrix to rotate per sphere instance based on the view/local-light vector.

Of course, you could do the diffuse lighting that way too, either one texture for diffuse only or two textures for diffuse + specular (but two textures in one pass won’t work on older hardware). But lighting+no_texture may be faster than texture+no_lighting on your systems. Something to test.

If nothing else, I’d go with the half-spheres. If you play your cards right, you can even come up with a clever way to have extra polygons near the silhouette for that smooth look, but few near the center to reduce your overall number of verts.


[This message has been edited by Cyranose (edited 02-04-2004).]

Thanks for the reply, you seem to have read my mind! I was just implementing hemispheres instead of spheres today! The view is never perspective, so all hemispheres have the same rotation.

Can I ask what you mean by turning off hardware lighting and computing it on the cpu? The light is indeed infinite. The cpu has plenty free, so it would be good to know how to do this.


[This message has been edited by Alan_Grey (edited 02-05-2004).]

Originally posted by Alan_Grey:
Can I ask what you mean by turning off hardware lighting and computing it on the cpu? The light is indeed infinite. The cpu has plenty free, so it would be good to know how to do this.

Basic hardware lighting does a simple calculation per vertex using position of the vertex and light and the normal. The result is a color per vertex that represents the intensity of the light. The diffuse component is the simplest, especially for infinite lights. It’s just dot(normal,light_dir), but note the signs. Crop that to [0,1] and multiply by the original object color, even add an ambient component if you wish. This is all documented in the spec and various other places, so you can recreate as little or much of the basic lighting equation as you wish, including the specular component, which takes the viewer direction into account too.

Compute a lit color per vertex and draw your hemispheres with these per-vertex colors and no material and no HW lights enabled. The result should look exactly like basic per-vertex HW lighting.

The possibly better looking version is to ignore the vertices altogether and use a lightmap texture that holds the dot(normal,light_dir)*base_color result per pixel. Compute this once or draw it in photoshop and rotate the brightest end towards your light (by rotating the texture coords, either on the CPU or with a texture matrix, the former is advised here since you only need to do it once). This will give a smoother look, even for low poly-counts, so you might save some geometry with this approach.

However, drawing with no lighting and no texture is most likely faster than using even one texture as a lightmap if you’re fill limited. You’d have to test.

Good luck.


Cyranose - thanks again for your help, I will try this if I have time on the project.


You can use the Jim Blinn trick; limit the sampling on your hemispheres. Just put vertices along the silhouette edge and one extra vertex where you have the lighting “peak.” If you’re using specular, you need to add another one for that “peak.”

If both the light and viewer are distant (infinite light, ortho projection), then all you need is a single display list that you can use for all your spheres.

It’s probably very cheap to compute the distance to each sphere, in which case you can do an LOD trick, where further spheres use fewer verts.