Slow quad for gradient background

Greetings to all.

I have run into a speed issue for my gradient background. What I am doing is fairly simple in that I am using one quad with different colors at each corner for a gradient background. This works great on most systems, however on some graphics boards this is SLOOOOOW.

I figured out that the problem is due to the size of the window and therefore the size of the quad. If you resize the window smaller, you reach a point when the background is drawn quickly.

I have considered breaking up the quad into 4 smaller chunks, but the problem would be how to interpolate the colors correctly so that the background wouldn’t look as if it was 4 separate regions.

BTW, I tried using 2 triangles in a strip and this didn’t make any difference.

I would really appreciate some help on this one.


Fill rate is fill rate. Doesn’t matter if it’s one quad or four quarter-quads. If your graphics card can’t fill well enough (like, the built-in “intel graphics” graphics) then you lose.

Unless, of course, the really SLOOOOW cards somehow were falling back to software for some reason.

The problem with the slow quads (or triangles) isn’t across the board. It is only when they get to a certain size. For example, I can make the window just a couple of pixels smaller and then the quad fill is so fast that can’t “see” it drawn. But if I move the window out just a bit, then the quad fill is so slow that you can see each triangle of the quad being drawn from corner to corner. I haven’t timed it, but I would guess it takes two to three seconds.

I don’t know if it is falling back on software to handle the larger quads or not, but there is a HUGE difference between the two speeds.

I’m not sitting at the computer that causes the problem, but will post the graphics card when I can to see if this might shed some more light. I do know that it is a nvida chip and want to say that is a Vanta card – it is definiely a low-nd card.

Thanks for the info so far.

defintly getting software for the larger window. heres an example on my old riva128 with 4mb video card i could only draw into a window of 800x600x16 any larger than that + i had overfilled the cards mem + got software rendering.

Well it looks like Zed was right on. It doesn’t matter what the size the quads (or trinagles) are if the window is too big. Everything slows down on this graphics card when it reaches some threshold.

BTW, the graphics card is a Nvidia Vanta with 16MB of memory and I’m using the lastest driver from Nvidia – which is a big improvement over the previous one.

So it sounds like there is nothing I can do if the user sizes his window beyond a certain size given this graphics card???

Thanks again for the help.


One other quick update. When I switched the number of colors used via the Display Properties from True Color (32 bit) to High Color (16 bit), it stays in the hardware mode (i.e. fast) with all sizes for the window.

Just thought I’d pass this a long.


Which resolution do you use?

And what do you request when making the OpenGL window.
Depthbuffer - how many bits?
Alphabuffer - how many bits?
StencilBuffer - how many bits?

A bad combination of these can throw you into SW mode.
These modes are safe for other Nvidia boards,
16+16bit = rgb/depth
32+24+8bit = rgba/depth/stencil

a 16mb card should be able to handle most resolutions, my vanata16mb can do 115286432 bit colour (its not exactly quick mind in 32bit colour). theres prolly a bug in your code.
im assuming you use windows. it sort of sounds like a problem i had a long time ago.
IIRC it was something to do with the WM_PAINT or WM_SIZE messages. sorry i can’t be of more help

Okay, here are some more details about the problem and the pixel format…

The screen resolution is 1280x1024 with 32 bits of color. The pixel format details are:
color bits - 32
accumulation bits - 64
depth bits - 24
stencil bits - 8

The problem occurs when the OpenGL window is around 1200 x 850.

Another very interesting fact is that when I force OpenGL to use the Software only driver, this is actually FASTER than the slow case with the hardware driver. So if it is going into software mode, it must be a software mode of the Nvidia driver rather than the Windows OpenGL driver.

As far as there being a bug in my code, I don’t understand how I could be doing something wrong that would cause this behavior. I can resize the window all I want as long as I don’t exceed a certain size.

It would be nice to determine exactly when this happens so that I could either warn the user or switch graphics context to the true software driver – which is faster for this size of a window. Any ideas how to figure this out?

Thanks for all the help thus far.

Wow! I would say, that explains it.

If you were to use a 1280x1024 window with all those attributes. It would take 20MB’s.

You say that the problem occurs around 1280x850, guess what, that’s just about 16MB RAM.

And didn’t Matt also say in another thread, that accumulation buffer isn’t accellerated on Nvidia HW???

I don’t think that you should expect HW accell. while requesting that kind of resolution.

Well I would at least expect software level performance!! Even if I go beyond the hardware capabilities of the card, you would think that the performance wouldn’t be any worse than the software only driver.

When my window size gets beyond the aforementioned point, the performance becomes much worse than if I was using the software only driver. As soon as I resize my window back down, everything is fast again.

Pardon my ignorance, but how exactly do you come up with the fact that I’m using 16 MB. I tried doing a couple of calculations based on the various buffer sizes and the window size and didn’t get 16 MB.

Thanks again for the info.

If you mean “the Microsoft OpenGL driver” by the SW renderer, no, there is no reason to expect that our SW renderer will be as fast as MS’s, because we have little to no interest in SW rendering (being that we are a HW company), and because if someone hits SW rendering, that’s enough of a problem in the first place that they should be doing whatever it takes to avoid it. Furthermore, our SW renderer needs to support a lot more features than MS’s does; for example, our SW renderer supports multitexturing (not to mention all the other fancy things like register combiners).

  • Matt

You’ve got 16 bytes for each pixel (4 RGBA, 8 Accum, 3 Zbuffer, 1 Stencil) x 1280 x 850 = 19,000+ bytes.

Originally posted by karbuckle:
[b]Pardon my ignorance, but how exactly do you come up with the fact that I’m using 16 MB. I tried doing a couple of calculations based on the various buffer sizes and the window size and didn’t get 16 MB.

Thanks again for the info.

(4 Bytes RGB)+(1 Byte Stencil)+(3 Bytes Depth)= 8 Bytes (leaving out accumulation 'cause its not supported anyway)

8 Bytes * 1280 * 1024 * 2 =~ 20 Mbytes

this is the amount of memory used for both back and front buffer.

hope that helps


Thanks to all that have replied and helped me solve this problem. I have learned quite a few things from this round of discussions.

To “Matt”, I wasn’t trying to be critical of the hardware driver’s software support. I was just surprised at how much slower the HW driver’s SW version was compared to MS’s SW only driver. I just figured that you guys would be better at the SW support (in addition to the HW) than MS since this is your specialty.

Okay, one final question related to this topic… Is there anyway to determine within my application that the HW driver has switched over to SW mode?

Thanks again for all the help.

Both wrong, the memory calculation goes like this:
xres * yres * (front_buffer_bytes_per_pixel + back_buffer_bpp + depth_buffer_bpp + stencil_buffer_bpp) here:
1280 * 1024 * (4 + 4 + 3 + 1) = 15 MB

In HighColor it’s:
1280 * 1024 * (2 + 2 + 2 + 0) = 7.5 MB

Relic, close but wrong. That’s still only an approximation of the amount of memory used.

Don’t ever rely on figuring out exactly how much memory we are using, because we do things behind your back that will leave you very confused when your calculations tell you one thing and observation shows you another.

  • Matt

So Matt, is there anyway to know when the SW portion of the HW driver has kicked in?


Karbuckle, no there’s not. OpenGL basically “requires” that all implementation support all features. It is afterall a API for professional uses.

What you are asking is something like DXs CapBits.

I heard funny stories about DX programmers using HW features that are reported accellerated somewhere in the scene and then later when using another HW feature, the before mentioned feature gets SW accellerated.
So basically, they have to check CapBits everytime they want to use a feature - within the scene! to be 100% sure… or just don’t use the feature at all.

I believe it was userplanes on coughNvidia hardwarecough, that gets disabled when using more than one texture.

I would rather have it the OpenGL way, than going through that nightmare.

Matt, is Relic right?

Is there only one depthbuffer/stencilbuffer? I allways thought that it was saved for the front buffer as well… though, now that I think of it, it doesn’t need to.