Why is my Q3 Level Renderer slow!

Hi!
I have managed to create a simple Quake3 Level Renderer. Textures and Lightmaps are succesfully applied and I think the PVS and Frustumculling works great, but it´s still very slow (sometimes aprox. 15 fps.)
The below code is my BSP Render() function:

void BSP: raw(float PosX, float PosY, float PosZ)
{
Frustum->CalculateFrustum();

CameraPoint[1] = PosX;
CameraPoint[2] = PosY;
CameraPoint[0] = PosZ;

int CameraLeaf=0;

CameraLeaf = FindLeaf();

if (CameraLeaf < 0)
	return;

if (&Leaves[CameraLeaf].cluster < 0)
	return;

glPushMatrix();
glTranslatef(PositionX, PositionY, PositionZ);

int i=0;
int x=0;
int j=0;
int k=0;
int CurrFace=0;
int n_leaffaces=0;

for (x=0; x<NumLeaves; x++) // For every leaf in the level
{
	if (TESTVIS(Leaves[CameraLeaf].cluster, Leaves[x].cluster)) // Test to see if leaf MAY be visible
	{	
		if (InFrustum(Leaves[x].mins, Leaves[x].maxs)) // Test to see if leaf IS visible
		{
			n_leaffaces = Leaves[x].n_leaffaces; // Get number of faces in the current leaf

			for (j = 0; j < n_leaffaces; j++) // For every face in that leaf
			{
				CurrFace = LeafFaces[Leaves[x].leafface+j].face;

				if (Faces[CurrFace].type == 1 && Faces[CurrFace].lm_index > -1)
				{
					glActiveTextureARB(GL_TEXTURE0_ARB);
					glEnable(GL_TEXTURE_2D);
					TexData[Faces[CurrFace].texture].Use();	

					glActiveTextureARB(GL_TEXTURE1_ARB);
					glEnable(GL_TEXTURE_2D);
					glBindTexture(GL_TEXTURE_2D, LightMapTextures[Faces[CurrFace].lm_index].ID);
				
					glTexEnvf( GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE,	GL_ADD );

					glBegin( GL_TRIANGLE_FAN);
	
					for(k = Faces[CurrFace].vertex; k < Faces[CurrFace].vertex + Faces[CurrFace].n_vertexes; k++)
					{
						glColor4f(Vertices[k].color[0], Vertices[k].color[1], Vertices[k].color[2], Vertices[k].color[3]);
				
						glMultiTexCoord2fARB(GL_TEXTURE0_ARB, Vertices[k].texcoord[0][0], Vertices[k].texcoord[0][1]);
						glMultiTexCoord2fARB(GL_TEXTURE1_ARB, Vertices[k].texcoord[1][0], Vertices[k].texcoord[1][1]);
				
						glNormal3f(Vertices[k].normal[0], Vertices[k].normal[1], Vertices[k].normal[2]);
						glVertex3f(Vertices[k].position[1], Vertices[k].position[2], Vertices[k].position[0]);
					}	
			
					glEnd();					
				}
			}
		}
	}
}

glActiveTextureARB(GL_TEXTURE1_ARB);
glDisable(GL_TEXTURE_2D);

glActiveTextureARB(GL_TEXTURE0_ARB);
glDisable(GL_TEXTURE_2D);

glEnable(GL_TEXTURE_2D);

glPopMatrix();

}

The TESTVIS Macro is from the Aftershock engine:

#define TESTVIS(from,to)
(*(VisData->vecs + (from)*VisData->sz_vecs +
((to)>>3)) & (1 << ((to) & 7)))

Please help me! It really bugs me!

Best regards
Humle

Just a couple of suggestions,

  1. Use glVertex3fv, glColor3fv (actually glColor3bv may be best). And do you really even need to use glNormal? Nothing in that code suggests you need normals.

OR

  1. Stop using immediate mode. Batch your polygons together into a vertex array (and associated arrays) then feed the arrays to OpenGL all at once.

Dfrey’s #2 is it. Think about it, you’re making 5 calls to OpenGL for every vertex in your scene, plus, vertices that are shared will be transformed and lit twice in immediate mode.
Use indexed vertex arrays for your triangle lists or, even better, stripify your geometry on level load time and use non-indexed triangle strips (that’s the fastest choice in most cases).

Hi ,
what is WAY more important than the points mentioned before is that you do not batch up your faces by texture/lightmap … but instead do all the setup code for each and every face you render .
thats whats slowing you down .
So , if you found out a face is visible put it to a kind of hash , with one slot for every texture/lightmap combination .
After you`re done visiting the tree you step through this hash , do the setup code for each slot (yes , you should use vertex arrays) and then render the whole slot . this way you have less texture changes … this is really going to speed you up …

with vertex-arrays and today’s hardware you can
draw entire Q3 level per frame (ignoring BSP and VIS data) at 50 fps

Thank you for your response, but Martin, please write some pseudo-code…Im lost! :smiley:

Best regards
Humle

Originally posted by Martin Kraus:
Hi ,
what is WAY more important than the points mentioned before is that you do not batch up your faces by texture/lightmap … but instead do all the setup code for each and every face you render .
thats whats slowing you down .
So , if you found out a face is visible put it to a kind of hash , with one slot for every texture/lightmap combination .
After you`re done visiting the tree you step through this hash , do the setup code for each slot (yes , you should use vertex arrays) and then render the whole slot . this way you have less texture changes … this is really going to speed you up …

Why don’t you use the BSP nodes as well as the leaves for frustum culling? They are much more efficient, as you can cull a maximum of half the map away if it isn’t in the view frustum.

Hi ,
uhmm , the code for this is rather complicated … let`s see : (simplified )

for each leaf
if (leaf_visible )
{
for each face in leaf
{
if not already in facelist , push to facelist
]
}

when done visiting :
for each facelist slot
{
if (!slot_empty)
{
set texture
set lightmap
for each face in slot
{
push to vertex array ;
}
render current vertex array ;
}
}

that`s about it … for real code : www.planequake.com/aftershock

Martin, what is a slot and should there be one slot for each face?
Btw. DeathWishes reply isent true, right?

Hi ,
you need one slot for every texture ( or shader ) / lightmap combination … all faces with the same texture and lightmap go into one slot … that is the whole point of them , to batch the faces with the same state up …
i cant explain every detail here ,also because its not fully on topic in this forum … check the source , it`s all there .
What DeathWishes ?!

Originally posted by DeathWish:
Why don’t you use the BSP nodes as well as the leaves for frustum culling? They are much more efficient, as you can cull a maximum of half the map away if it isn’t in the view frustum.

you misunderstood my intention.
I didn’t recommend ignoring BSP & VIS.

I wrote my program just to extract data; performance wasn’t high priority.
Rendering was a simple test to show whether i had interpreted the data right.
I’m not trying to excel in writing Q3-clone.

here is what my simple,dumb renderer does:

(have a laugh, if you wish )
packs all geometry into single array (didn’t even bother with CVA or VAR)
packs all lightmaps into single texture (1024x1024 can contain 64 of them, all maps i’ve seen have less)
draws ONLY base geometry with lightmaps + own skybox
no shaders, textures, models, etc.
no visiblity or frustrum culling (ignores BSP & VIS)
draws entire map per frame, with single glDrawElements()
windowed (about 1000x700), OnIdle driven
every 3x3 bezier patch tesselated to 8x8 vertices

on GF3 + Athlon XP 1700 it runs at 50 fps (to my own surprise).
it even reaches 100 fps with lower patch tesselation.

Hi!

I wonder if it is always faster for an engine to use vertex arrays. Let’s imagine an engine like Aftershock that has a seperate frontend and backend. The frontend does the scene culling and creates a buffer for vertex coordinates and texture coordinates that can be accessed by the glDrawElement function. The problem is that you need to copy the vertex from the frontend into that array using memcpy. Now imagine you have the following vertex structure for an engine:

struct Vertex
{
Vector3 pos; // Position
float u, v; // Texture coordinates
Vector3 basis[3] // Tangent space basis
}

This vertex has 56 bytes. So you have to copy 56 bytes for every vertex and that’s pretty slow I guess.

If you would use immediate mode, it would be possible to create just a pointer to your vertex rather than copying the complete vertex structure.

Hmm, I haven’t tested that yet, but wouldn’t creating the pointer and calling glVertex3f be faster than copying the complete 56 bytes into the vertex array?

Could someone please give a comment on that?

Thanks
LaBasX2

First of all, since the level data is static, there really should be no need to copy it every frame, even when using vertex arrays. If there is, your level data structure isn’t very good when it comes to rendering. And even if you had dynamic geometry, CVAs would probably be faster than immediate mode, not to mention VAR or VAO.

Originally posted by LaBasX2:
[b]The frontend does the scene culling and creates a buffer for vertex coordinates and texture coordinates that can be accessed by the glDrawElement function. The problem is that you need to copy the vertex from the frontend into that array using memcpy.

If you would use immediate mode, it would be possible to create just a pointer to your vertex rather than copying the complete vertex structure.

Could someone please give a comment on that?
[/b]

In this case, yes, immediate mode would likely be a little faster - but why would one want to cull every vertex instead of culling by object? OpenGL does culling and clipping per vertex on its own, and on today’s GPUs it’s certainly faster to just do raw culling by object and leave the rest to the hardware.
So, in your app you wouldn’t cull every vertex but rather cull a set of primitives, and hence vertex arrays (an object), at once

Originally posted by Humle:
DeathWishes reply isent true, right?
Yes it’s true. If the view frustum is entirely on one side of a node, then you can immediately cull everything on the other side of the node. That’s one of the main reasons for using BSP trees.

Well! Then I am lost…again!
Where excactly should I cull out the nodes in my rendering-loop (see the first post)?