Increasing rendering speed

Hello friends,
I’m searching now for ways to increase rendering speed. My engine is using an octree. Currently, I am looping though all the nodes within the view frustum drawing each polygon. Polygons are presorted by texture ID so textures are selected once for every node they are in. My texture manager only binds a texture if it is not currently bound.
I am looking into the idea of using polygon pools. My idea is to keep pools that can hold up to 256 polygons each, one pool per texture used. During rendering I will copy the polygons into these pools and flush them either when they become full, or when I am finished traversing my tree.
I also plan to keep seperate pools for polygons that need blending.
Is this a good idea? Is it more trouble than its worth? Is there a better way to gain speed? What about polygons that need blending?
Any help on making a good, fast rendering engine would be appreciated I’d like to be able to render a good 10k polys per frame texture mapped, I’ll do my own lighting. I’ve tried many ideas, but none of them seem to go quite fast enough. I have a PIII 800 302mb RAM GeForce 2 MX.
I have dynamic game world content in mind.

Had to answer to this one :slight_smile:

I’m also programming a “dynamic” octree-based engine ( which means that you can modify any vertex value: position, texture, color… ), and i wanted it to be as fast as possible, so i’ve come to a solution that is very similar to your idea.

I have the concept of “rendering buffers”. These are equivalent to what you call “pools”. When i want to render an object, i push it to the RB. All the visible faces of the object are then added to the RB, and a list of active shaders ( associated to their primitives ) are kept. I call them “rendering lists”. So, if i have a RB with 10 shaders and 50 polygons per shader, i’ll have one RB with 10 RLs and 50 primitives per RL. I also fill the vertex arrays at this moment.

When the RB is full, i flush it. This operation is actually quite complex. Whereas most engines to a “for each shader, for each pass, render (shader,pass)”, i’m doing a “for each pass, for each RL, render (pass,RL)”. Given a pass, i sort the RLs (yes, i do it for each pass) by state switches (like textures, blending modes, etc…) to minimize opengl states. I then setup the vertex arrays, and render the RLs.

To avoid synchronization problems when using vertex array range, i also use 4 RBs, and many tricks to avoid filling vertex arrays that have already been filled previously with the same content.

With this method, i get 8000 texture-mapped,lit tris @ 15 fps, debug mode and individual triangles ( no tri strips yet ). This is on a P200 + Vaudoo2 only, so i guess 10k tris @ 100 fps are certainly reachable on a P3-800 + GF2.


Why copy stuff at all?

Copying is slow…

I did a quadtree, whereby I had a global list of polygons, and each leaf node, contained a list of indexes into that polygon list. Thing was I was only using immediate mode programming, but it was right fast.

If you do that, you can render the entire leaf node using glDrawElements, with only 1 list of polygons. Should be fast. Or maybe compile each leaf node into a display list, so that it’s all copied to VRAM/AGP. Then just traverse your tree calling glCallList at each leaf node.

thats what I would do anyway…
I dont have much experience in quad/oct trees though…

Ysaneya - Thank you for you’r reply, you’ve got me thinking

Nutty - Display lists are out, dynamic lighting and texture mapping requirements. I am currently using vertex arrays for each node of the octree, its not being all that fast. I’d like to reduce texture uploads and state changes.

EH?? You can use dynamic lights with displays lists, if you’re using GL lighting?
Or are you doing your own vertex lighting?

Yes, I will do all my own lighting calculations (unless somebody can tell me how to setup GL lights with starting and ending attenuation values, you know like a MAX Omni light).

I will also be dynamically modifying texture coordinates for some geometry, maybe I can use the texture matrix though, I’ll have to look into that. If I can use the texture matrix then I could possibly break terrain into display lists…

FWIW i dropped doing my own lighting a couple of months ago (after nearlly 1.5 years doing it)
why? cards seem to be increasing speed at a greater rate than cpu’s.
using opengl’s lighting instead of my own decreased my engines speed by about 10% on my non hardware tnl card. i imagine on a hardware tnl card it will have increased the speed. also gl lighting tends to look better for positional lights ( i was using first a quick and dirty hack cause they’re expensive to work out in software)

Ok, I’d be perfectly happy using OpenGL lighting, but how do I emulate max’s omni lights? I need to make sure that when an artist lights a level in Max and exports it that it will look the same in the engine as it does it max (or at least pretty close). Any ideas?

AFAIK a omni light in max is the same as a point light in OGL.

A point light in GL has a constant, linear, and quadratic falloff coefficients. An omni light in Max as starting and ending near and far attenuation distances. You cant always acheive the same effect with the two different styles. DirectX supports both types of attenuation calculation (I think, I’m not sure about both, but it can do it like the Max Omnis).

> If you do that, you can render the entire leaf node using glDrawElements, with only 1 list of polygons. Should be fast.

Yeah, i now, i’ll give it a try, but won’t that be cache-unfriendly ? I mean, say you have a list of faces ranging from ID 0 to ID 10. Say faces 1,2,4,5,6,8,10 are visible… won’t it slow down performance, since it’s not contiguous ? It is also written that vertex arrays should not be more than 64k of memory (~1000 vertices), but if i have 10000 faces in my object and only 1000 of them are visible, shouldn’t i copy the visible faces in another array and render this array to have better performance ?