opengl Multitexturing and performance

Hi everyone. This is my problem:
I’m displaying a rather big mesh (~500000 vertices) using triangle strips, as a display list.
When I apply just one texture I get around 120 fps.
When I try to apply a second one, as a detail texture, using glMultiTexCoord2fARB, my fps drop to about 3 (!!!), which is the same rate I get without display lists (immediate rendering mode).

I was able to track down the problem to a single issue…
I have the same bad performance if (without activating multitexturing at all) I duplicate the tex coord functions.

Which is: if I do, for every vertex
glTexCoord2f(u, v);
glTexCoord2f(u, v); /// I know… this is stupid…

I have the same (bad) perfomance as

glMultiTexCoord2fARB(GL_TEXTURE0_ARB, u, v);
if(tDrawMode == T_TEXDETAIL)
glMultiTexCoord2fARB(GL_TEXTURE1_ARB, u, v);

any hints aboout this?
Thanks all in advance.

First, make sure you have the latest available drivers for your graphics card.

Second, when posting performance questions, please post the hardware kind, driver version, OS, CPU and RAM information of your system. Else performance comparisions are meaningless.

If this is on PC hardware, then AGP aperture size may matter.

Try slicing your model in several smaller display lists; it may be that you’re hitting some fixed upper size limit for efficient display lists on your hardware in question.

Perhaps you are just running into a limit on the number of commands a display list can effectively compile before dropping to some fallback . (ie 500 000 * (1 position(3float) + 1 tex coords(2float)) ~ 10 MB ) I would try and use VBO for something this big.

Ok, some details about my hardware:
running a PC box, Penitum 4 1.7, 512 Mb Ram, XP-sp1.
Graphics: Radeon 9500 Pro with ATI catalyst 4.5.
MB is a Asus P4V800-X, AGP aperture size 256Mb, should support AGP 8x although it’s greyed as “auto” in BIOS.

some new benchmarks:

div: 512 (~525000 vertices):   DL - 2 TEX:   5 fps
                               DL - 1 TEX: 140 fps
                            NO-DL - 2 TEX:   5 fps
                            NO-DL - 1 TEX:   8 fps
div: 128 (~33000 vertices):    DL - 2 TEX:  70 fps
                               DL - 1 TEX: 250 fps
                            NO-DL - 2 TEX:  72 fps
                            NO-DL - 1 TEX:  96 fps
div: 64  (~2100 vertices):     DL - 2 TEX: 220 fps
			       DL - 1 TEX: 256 fps
                            NO-DL - 2 TEX: 214 fps
                            NO-DL - 1 TEX: 226 fps

so, it seems that with 2 tex and lot of vertices I loose all the benefits of a display list… Does switching to vertex arrays solve the problem?

You’re probably using 32 bytes per vertex – 12 bytes for position, 16 bytes for a pair of texcoords, 4 bytes for color; please correct me if that’s wrong.

If that’s true, your problems start to appear when you exceed 1 MiB of vertex data.

My best guess is that the driver puts large (>1MiB) display lists into AGP memory, and your AGP subsystem doesn’t work properly.

Check ATI’s “Smartgart” tab in display properties to see what your current AGP mode is. If it’s anything less than 4x, you should head here to get the drivers for your VIA chipset.

just for benchmarking purposes I don’t use colors or normals, so that’s a total of 3+2+2 floats per vertex. I checked the AGP mode and was “off”… reinstalling the VIA drivers I have “8x” but no fastwrite. With this new configuration I got the following:

AGP 8x (no fastwrite)
div: 512 (~525000 vertices):   DL - 2 TEX:  16 fps
                               DL - 1 TEX: 142 fps
                            NO-DL - 2 TEX: 9.5 fps
                            NO-DL - 1 TEX:  12 fps

which is a good improvement, but I still don’t understand the big difference between 1 and 2 texs…

I’m not sure what the cause is for the slowdown, but I’m guessing the ATI drivers are screwed with large geometry + DL. It won’t be the first time.

Probably best to give VBO a try, and tell it to put in video memory.

Are you doing anything else in your render loop or this is just a plain renderer?

well, this is part of a large project. But, in order to investigate this issue I just call the mesh list, draw a sphere at the origin (to know where it is) and draw (using glutBitmap) the fps values on the
top left corner of the window.

Anyway, correct me if I’m wrong, but, if I draw the scene with 1 tex at 140fps, then I should be able to do multitexturing with two-passes rendering at (theoretically) 70fps. The multitexturing extension should increase the performance… isn’t it? :frowning:

Before worrying about all this the first thing to do is use VBOs or VARS and upload your vertex and texcoord data in bulk. 500,000 calls to glVertex and glTexCoord is crazy.

more on this topic… After some days spent on investigating the code I wrote, I tried to run the program on different computers, with the following results…

Ati Radeon 8500 DDR (64Mb)

1 Tex: ~ 36 FPS
2 Tex: ~ 15 FPS

on a Nvidia GeforceGo (don’t remember the model) both the versions of the program ran at about the same speed :confused: (!!)

Remember that I had ~140 FPS with 1 tex and ~14 with 2 tex on my Hercules 3D Prophet 9500 Pro.

I also tried to use vertex arrays but with the same (unrealistic) difference on perfomance.

Ayway, I’m really confused. Is there anyone that would like to run the programs on his computer just to tell me the differences in performance?

Here is the link:

Thanks a lot in advance to everyone…

valterc, for what it’s worth

1: ~170 fps
2: ~138 fps

system: 2.4GHz p4, XP sp1, nvidia fx 5900 (128Mb) v53.03

It seems likely that by adding a second texture coordinate set, you’re exceeding some fixed buffer size limit in the drivers for your card, and falling back to a slower path.

Try rendering a scene half the size, and see whether there is any difference between 1 and 2 textures. If not, then it’s very likely just a size issue.

Second, try running the same benchmark with 1/4 the window size. If it runs much faster, then you know it’s fillrate, not geometry. If you don’t have MIP maps enabled, you should generate and enable them, as they will reduce texture cache trashing. NEAREST_MIPMAP_NEAREST is the fastest texture filtering mode (faster than plain NEAREST).

If it’s always the case that 2 textures is slower, then there’s some optimization in the ATI driver where they recognize and fast-path single texture coordinates, but you get some generic, slower path for two texture coordinates.

Slightly OT:
I’ve actually found GL_LINEAR_MIPMAP_NEAREST (“bilinear”) to be slightly faster than GL_NEAREST_MIPMAP_NEAREST. I guess this must be because the latter behaves a bit like data prefetching.

Ok, some facts:

a) I use GL_LIENAR_MIPMAP_NEAREST for mignification
b) If I put inside the display list vertices, 1 tex coord, normals and colors I only loose a few FPS, comparing with just vertices and tex coord. The big drop down is when I put 2 tex coords, even if I don’t apply the second texture.
c) Using vertex arrays inside a display list I have, more or less, the same performances, probably slightly slower.

Now for some new benchmarks, I had to disable AGP,
but this is another problem… :frowning:

so, with a different windows size:

1 tex 800x600 ~131 fps
1 tex 200x150 ~146 fps
2 tex 800x600 ~5 fps
2 tex 200x150 ~5 fps  

rendering with fewer vertices:

1/2 size 1 tex ~214 fps
1/2 size 2 tex ~10 fps
1/4 size 1 tex ~313 fps
1/4 size 2 tex ~20 fps
1/8 size 1 tex ~390 fps
1/8 size 2 tex ~38 fps

1/8 size means ~66000 vertices

I also have the feeling that this is an ATI problem… NVIDIA benchmarks are more reasonable.

Hi all,

Recently, I’ve had big problems with using display lists also. Surprise surprise, it happened since I bought a Ati radeon 9800XT.

I avoided this problem by using VBO’s, as had already been suggested a number of times in this thread.

So I spent the last few hours to generate some VBO based code using the same textures.

You can download the windows executable at:

It uses 4 VBO’s:

Blending between the two textures is controlled by pressing ‘b’ (towards tex1) and ‘B’ (towards tex2)

Movement is completely mouse controlled. Leftbutton for rotation. Middle and rightbutton for translation.

Resulting FPS is 60-80 FPS.



Well, using your exe I get around 27fps, which is much better then what I get with my code.

But, really, what I expect using multi-texturing (blending two textures) hw extensions is a frame rate better than half the fps I get with just one texture. It really seems to me that the best solution for doing this is two draw two-times the whole mesh (this way I get ~74 fps…)

It seems to me that this is not a multitexture access problem.

I also tested assigning the same texcoord buffer object for both textures units. This resulted in speeds of +/- 150 FPS.

So my guess is that the problem isn’t at the fragment pipeline, but that the extra texcoord buffer is overloading the vertex units.


I have no clue what the problem is… just wanted to point out that the application crashed on my computer. There was a lot of hard drive activity for about one minute, and then it crashed. I started it a second time and it crashed right away. This could just be my system, but it’s pretty up to date (2.6 ghz, radeon 9800, 1 gig ram).

It obviously works for two other people, so it’s probably just me, nothing to worry about; i just wanted to point it out.

Good luck

Can you post some code ? The slowdown is quite incredible, actually i wouldn’t expect to see more than 10% of a performance hit when using a second texture coordinate set.


At a compleat guess I would say the slowdown was being caused be the calculations necessary to blend the 2 textures togeather.