GL_QUADS slower in display list that immediate mode

This is a continuation of an earlier thread that knackered started back in May (http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/006395.html).

I’m rendering quads in groups of 100 to 1,200 at a time. Drawing them in immediate mode is about 15% faster on both a GeForce3 Ti500 and Radeon Pro 9700.

Since display lists aren’t making it, I’m trying NV_vertex_array_range and ATI_vertex_object.

For the ATI implementation, I’m using ATI_STATIC because the geometry doesn’t change. The frame rate went from 60 to 100 Hz using this technique - exactly the kind of increase I was looking for. Display lists ran around 52.

For the NVIDIA implementation, I feel like I must be doing something wrong, because I’m getting the same frame rate using VAR as immediate mode. Because the memory is write-once / read-a-lot, I allocate the buffer with wglAlloc(size, 1.0, 0.0, 1.0), but I’ve tried a number of different parameters. No increase. Same result on a GF3 and GF4.

Using VAR in the past, I’ve gotten tremendous speedups for dynamic geometry (by using fences, etc), but this is my first use with static geometry. Are there different guidelines in this circumstance that I’m not aware of?

I’m getting the same frame rate using VAR as immediate mode. Because the memory is write-once / read-a-lot, I allocate the buffer with wglAlloc(size, 1.0, 0.0, 1.0), but I’ve tried a number of different parameters.

First, the read/write frequency is for CPU-based reads/writes, not graphics read/writes. As such, you intend to write-once and read never: (0.0, 0.0, 1.0).

Secondly, nVidia docs specify which setting will actually allocate good memory (AGP/video) and which ones won’t. I believe that (0.0, 0.0, 1.0f) is requesting video memory, but you’d have to check their performance docs to be sure.

You’re right - I screwed that one up. But, I have tried 0, 0, 1 and get the same rate. In fact, I’ll be surprised if there’s any meaningful combination of these numbers I haven’t tried!

Update, for what it’s worth:

I’m drawing the quads using glDrawArrays() - every vertex is unique has a 4ub color, 2f texcoord, and 3f position. They never change.

glDrawArrays() invoked with standard vertex arrays yields ~60 Hz.

glDrawArrays() encapsulated in a display list yields ~52 Hz.

The same call using ATI_vertex_object yields ~90 Hz. NV_vertex_array_range yields ~58 Hz.