Slowdown on the 22nd glDrawElements

I have a speed bottleneck in my 3d engine. I render a number of objects with the same mesh (~350 triangles) and no texture using parallel projection (so they are exactly the same size on screen). I can render 21 at a good framerate but the 22nd call to glDrawElements() takes much longer than the other ones. For all other calls it takes around 0.3 ms but the 22nd takes 20ms! This slowdown also seems to depend on the display size! I can render 80 objects and ONLY the 22nd is slow.

Any clues?

/Jonas

Not sure, but it could be that openGL has to finish rendering some of the tri’s before it can add the 22nd call to the que…just a guess…

Timing is not very accurate unless you have a glFinish() inside the timing loop. This is especially true on HT&L cards like the GeForces and some Radeons.

sort of sounds like youre coming in round that 64kb mark.
post a short piece of code showing how you do the rendering

Every renderpass I do this:

VertexBuffer:

glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_FLOAT, 0, m_coords.begin()->data());

PrimitiveList:

if(m_trilist.size())
{
	primitiveTimer.startTimer();
	glDrawElements(GL_TRIANGLES, m_trilist.size()*3, GL_UNSIGNED_SHORT, m_trilist.begin()->data());
	float time = primitiveTimer.stopTimer();
}

i cant see anything wrong there , though is it necessary to enable the various arrays every time. eg personally i leave vertex arrays + texture unit 0 arrays enabled all the time cause everything i draw with drawelements uses texture coords + vertices!!!. perhaps your timer function isn’t so accurate.
youre askig for something like this
start timer
draw a few things
stop timer
thats gonna take a lot less than 1ms to do

a simple info, but the renderingtime you cant get with this code:

primitiveTimer.startTimer();
glDrawElements(GL_TRIANGLES, m_trilist.size()*3, GL_UNSIGNED_SHORT, m_trilist.begin()->data());
float time = primitiveTimer.stopTimer();

why? cause you simply have tested how long it takes for opengl to store the info what it has to render… rendering it does then while you continue with your code… glFinish( ) will wait till it has finished, glFlush somehow, too… and SwapBuffers calls glFinish or glFlush, too…

so use this code:

primitiveTimer.startTimer();
glDrawElements(GL_TRIANGLES, m_trilist.size()*3, GL_UNSIGNED_SHORT, m_trilist.begin()->data());
glFinish( );
float time = primitiveTimer.stopTimer();

and before your first timetakeing, call glFinish, too… ( or glFlush, i really dont know the difference, anyone? )

It is also important to take an average; let it loop through the code a few thousand times and take the time.

Inserting a Flush() after glDrawElements() doesn’t change anything. The timing is unchanged!

Timer code (Windows):

void Timer::startTimer()
{
QueryPerformanceCounter(&oldTime);
}

float Timer::stopTimer()
{
QueryPerformanceCounter(&newTime);
LARGE_INTEGER countsPerSecond;
QueryPerformanceFrequency(&countsPerSecond);
float time = ((float)(newTime.QuadPart - oldTime.QuadPart)) / (float)countsPerSecond.QuadPart;
totalTime += time;
return time;
}

Inserting a glFinish() after each glDrawElements() makes the first call take 25ms and the rest 1.2 ms.

Sounds like some kind of initialization. What you are interested in is the steady-state performance, not this one-time overhead. It sounds like it’s not something to worry about.

You need to do a Finish after the last DrawElements only, and you need to make the benchmark long enough that if there is any kind of one-time overhead, it’s not significant.

  • Matt

This IS a problem. One call to glDrawElements takes 20 ms will have a framerate of 50 FPS and then I do nothing else!!! I need to get rid of this because it is the bottleneck of the engine.

If I am not doing anything wrong this means that I cannot have more than 21 objects with 300 triangles in each without a crawling computer (P-III 750MHz, GeForce2).

This behavior makes perfect sense. The first call to Finish that you make forces the renderer to sync up with your code. And since the driver is probably buffering
up OpenGL commands until it hits a finish, who knows how much work it is doing on your first call.

You should force a finish before the first call, and then after each call. Now the first call will still take longer because the driver is allocating and moving data around for your vertex array, color array, normal array, tex coord arrays, ect… Then each subsequent call will be fast so long as you don’t change those arrays.

The only additional optimizations that you could do would be to compile the arrays if you are not changing the data in between drawelements calls, and to impliment vertex array range with NV fence.

Keith

If I do this:

glFinish();
while(blahblah)
{
timer.start()
glDrawElements();
glFinish();
timer.stop();
}

…then the first glDrawElements() takes 14ms and the rest around 1.4ms each.

/Jonas