Terrible ATI opengl performance

I wrote my own media centre software … for my living room :eek:
And it has an opengl front end renderer for video. It renders from direct show and simply passes a pointer to opengl for rendering. Like so

glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB, videoWidth, videoHeight, 0, pixelType, GL_UNSIGNED_BYTE, imageData);


My pc has a low end ATI card X300 or X550. It is a new card, supports shaders etc, just fanless and low end (perfect for HTPC). Anyway under XP this works perfect. Performance is great. On Vista 32 and vista 64 however, performance is terrible, to the point where i can’t watch anything at all. I can see also CPU usage is very high, 2x that of XP when trying to render video, although never quite 100%. The video is rendering fullscreen, so under vista there isn’t that extra copy required like there would be in windowed mode.

Anyone else encounted this problem with ati drivers under vista ? :eek: The pc I am using is hardly low end, 2gig of ram, 3.4gig p4.

I was running identical software during my test, same hardware, only thing was different was the OS. (Latest drivers etc).

Do you use GDI on top of OpenGL?

if so, see section What All This Means for the OpenGL Developer,
sub-section “GDI compatibility notes”:

also in the sub-section “Performance notes and other recommended practices”:

“Calling synchronization routines like glFlush, glFinish, SwapBuffers, or glReadPixels (or any command buffer submission in general) now incurs a kernel transition, so use them wisely and sparingly.”

Better use glTexSubImage2D for updating the texture. Also make sure, you use GL_CLAMP_TO_EDGE as texture coordinate wrapping mode. Additionally, make sure, your source data type and format is not something weird (i.e. for instance not GL_RGB and GL_UNSIGNED_BYTE). Furthermore you might try GL_RGBA as internal format instead.
My 2c :slight_smile:

You are doing it wrong ™ :slight_smile:

Don’t call glTexImage2D repeatedly. Call it once on startup and use glTexSubImage2D to upload data.

Don’t use GL_RGB as the internal format, as it is not supported natively. Use GL_RGBA.

For optimal performance, make sure that pixelType is something nice, such as GL_BGRA with GL_UNSIGNED_BYTE. Almost anything else will have to be swizzled by the driver.

The last item is the killer in OpenGL video, as decompressed frames tend to be in YUV or RGB or something ugly like those. BGRA is the way forward :slight_smile:

For extra performance, you can use PBOs and upload data from a secondary thread. If your target hardware doesn’t support that, you might be able to create more than one OpenGL context and upload data to multiple textures in parallel. You then render each texture in round-robin fashion, thus hiding upload latency.

Do you use GDI on top of OpenGL?

nope :eek:
No glflush etc either

glTexSubImage2D, i’ll definitely try this one. Looks like I should have already been using this before.

As for swizzling. I seem to recall with nvidia that if you specify the internal format as RGB, it’ll actually store it as BGR to match the windows pixel format. That assumption correct ?

I am not sure either whether RGBA would really be that much faster since the data is always packed to mod 4 bytes anyway.

All good suggestions though :slight_smile:

Yeah, just remember:

glTexImage* - Allocate and fill
glTexSubImage* - Fill only

You only need to allocate once.

i changed to gltexsubimage
and installed vista64 again, issued solved !

I am not sure either whether RGBA would really be that much faster since the data is always packed to mod 4 bytes anyway.

All good suggestions though :slight_smile:

Just FYI, modern hardware does not support RGB internal formats natively (it uses RGBA anyway). It probably doesn’t make any difference as long as you are uploading the data in a mod 4 bytes format, but sooner or later you’ll probably encounter some anal driver that falls back to software rendering or something. Better be safe than sorry :slight_smile:

Also note that a RGB/RGBA internal format typically means BGR/BGRA. Which means you’ll get a nice performance boost if you upload data in BGRA - this way the driver won’t have to perform the conversion for you.