Radeon / GeForce performance question

This thread is very funny.
I wonder if it belongs in “OpenGL advanced coding” but it’s funny.

Humus and Sylvain, get a life you both. Terms like Nvidiot etc maybe acceptable in some fanboyish site, but not here.

This thread is officially sucks…

Jack

yeah, sucks somehow, but anyways, the stats shows some simple fact:
radeon is **** or radeondriver is ****…
i dont think radeon is ****, so it has to be the driver

AND
the nvidiadrivers have now new CopyTexSubImage routines, the old ones where much slower… and i know BigB uses the new drivers…

CopyTexSubImage means copieing memory, and in the softwarerenderer its no big stress cause there its in fact nearly a simple memcpy dont take so much time…

Originally posted by Gorg:
[b]I am not going to add to this senseless and very childish discussion but I though Matt was clear about extensions a while ago. This is how I understood it:

when a vendor creates an extension, then call it with their company id. If other vendors come to them an says : hey! this extension is cool, I want to help doing it, then it becomes and EXT.

If the review board say : hey this extension is cool and generic enough, we will make it ARB.

oh, and having an NV or ATI on an extension doesn’t stop another vendor from using it.
[/b]

Sure, but if it’s proprietary they can’t implement it nor promote it too ARB regardless of how cool it is.
That just kills the whole idea of an open API.

Originally posted by JackM:
[b]Humus and Sylvain, get a life you both. Terms like Nvidiot etc maybe acceptable in some fanboyish site, but not here.

This thread is officially sucks…

Jack[/b]

Sorry, but I just don’t like when people base their opinion on a single test and start to bash the card and/or the driver team without trying to find the source of the problems which most likely lies somewhere within their own code. When I experience a problem I try to find the source of it. When the source is found I correct it if it’s within my own code and if it resides in the drivers I notify the driver developement team. During the 6 month I’ve owned my Radeon it’s only been two times I’ve found a problem that wasn’t my own code. Both times I notified the driver team and got good feedback. The first time it wasn’t even a driver bug but rather a hardware limitation I was unaware of, and the second time the bug got fixed quite quickly.
You can certainly not expect the driver team to fix bugs they are unaware of.

When I posted this topic I was hoping that one of the ATI guys might pick up the demo and give some insight into the problem. The original question was a serious one.

Originally posted by davepermen:
[b]yeah, sucks somehow, but anyways, the stats shows some simple fact:
radeon is **** or radeondriver is ****…
i dont think radeon is ****, so it has to be the driver

AND
the nvidiadrivers have now new CopyTexSubImage routines, the old ones where much slower… and i know BigB uses the new drivers…

CopyTexSubImage means copieing memory, and in the softwarerenderer its no big stress cause there its in fact nearly a simple memcpy dont take so much time…[/b]

Neither the Radeon or it’s drivers are ****. I haven’t used glCopyTexSubImage myself, but since it seams to be really slow I think the best solution to the problem would be to send a message to devrel@ati.com that there is a problem.

Originally posted by heeb:
When I posted this topic I was hoping that one of the ATI guys might pick up the demo and give some insight into the problem. The original question was a serious one.

I just picked up the code and found 1 prob, but it’s not THE prob. You’re taking out a 512x512 texture from a window of size 640x480 … that 512x512 doesn’t fit into the window.

Originally posted by Humus:
Sure, but if it’s proprietary they can’t implement it nor promote it too ARB regardless of how cool it is.
That just kills the whole idea of an open API.

I am not sure where you get that from. From what I understand, anybody can still implement it. And ARB can ask the vendor to move to ARB, just like they did with dot3.

Opengl is open.

Originally posted by Humus:
I just picked up the code and found 1 prob, but it’s not THE prob. You’re taking out a 512x512 texture from a window of size 640x480 … that 512x512 doesn’t fit into the window.

No, look at the start and end of updateReflectionTexture function, I resize the viewport to 512 by 512 when rendering for the reflection, then resize the viewport again when rendering to the current window size.

Originally posted by heeb:
[b]Hi,

I have a demo app that creates a reflection on water by the rendering the scene into a texture (using glCopyTexSubImage) and then blending with the water surface using a texture projection to make the reflection distort as the water ripples. The frame rates on 3 different systems are shown below, which brings me to my question:

Why is there a massive performance difference between the GeForce & Radeon cards? The bottleneck on the Radeon is the glCopyTexSubImage call (see Radeon performance with and without render to texture).

Why is the s/w renderers performance unaffected by the call to glCopyTexSubImage, but the Radeon’s is?

Source and/or exe can be downloaded here if anyone wants to try it. www.futurenation.net/glbase
Both downloads are less than 300k.

Performance stats:

AMD ThunderBird 800 / Geforce 2 MX (by BigB)
Running at 1024x768/32 bits i got between 90fps and 140 fps…, very smooth
Running at 1024x768/16 bits i got between 30fps and 25 fps…

Celeron 466 / Radeon 64DDR (my system)
1024x768/32 bits - 2 fps with render to texture on (250 fps with render to texture off)

Pentium 266 / no OpenGL acceleration (my laptop)
1024x768/32 bits - 1 fps with render to texture on (1 fps with render to texture off)

Thanks
Adrian[/b]

without looking at your code, i can make the following suggestion, you might want to look at the pbuffer extension that lets you render to an off screen color buffer and then bind this buffer to a texture. i just finished a demo that does this and it is quite fast. also, i downloaded and ran your binary and i see that the reflection texture often contains pixels from my desktop (outside of the GL window)… so you may be having issues that you aren’t aware of that make your program slow. --Chris

Originally posted by heeb:
No, look at the start and end of updateReflectionTexture function, I resize the viewport to 512 by 512 when rendering for the reflection, then resize the viewport again when rendering to the current window size.

It doesn’t matter, you still need to have a framebuffer large enough to fit the texture since it’s read from framebuffer. You didn’t think it would resize the framebuffer each time for you? That would kill performance.

Originally posted by Gorg:
[b] I am not sure where you get that from. From what I understand, anybody can still implement it. And ARB can ask the vendor to move to ARB, just like they did with dot3.

Opengl is open.[/b]

How come then so few of nVidias extensions has become ARB? I can hardly think they weren’t cool enough. The DOT3 extension (which was invented by ATi) wasn’t proprietary, thus it could be promoted to ARB.

I have to agree with Gorg, Humus. Check out the extension registry, especially the links at the top. Extensions are just specifications, just like OpenGL is a specification. The ARB doesn’t dictate how the specification has to be implemented internally, just that an implementation complies to the spec.

All the nVidia, ATI, SGI and other extension specifications are available to all at the extension registry and if any company wants to implement them, they are free to do so, as long as they comply with the specification. I think the reason why there are so many NV extensions is that they are forward thinking and interested in development. Probably the reason why so few NV extensions have become ARB is because the other companies don’t want to implement the extensions because their hardware may not support it so well, or because it’s a lot of work to develop a new feature like that.

Anyway, have a read of the extension registry notes for more info.

[This message has been edited by ffish (edited 05-15-2001).]

Well you tell me what “IP status: NVIDIA Proprietary” means. It’s there on almost all their extensions.
Sure, nVidia ís forward thinking, but so are ATi too. The only difference is that nVidia is locking their extension with legal crap to prevent others from implementing them. I wouldn’t call that “open”.

If you look at ATI extension, they are also marked ATI proprietary.

I believe NVidia extensions haven’t been put ARB because they are too close to their hardware. Just look at the register combiners : You really need to have designed it with hardware in mind. It is probably extremely difficult to implement that extension on hardware that wasn’t design for it.

Ati extensions are more simpler and higher level. DOT3, Vertex streams, vertex blend.

I still consider I might be wrong, but I don’t think so at the moment. I’ll look into the whole extension story when I will have the time.

first of all… the demo is cool… i have a gf2mx so no problems with speed here… it looks pretty nice…

second:
i like the straightforward extensions on lowlevel of nvidia very… they are not simple to understand, but once you got em you can get everything you want out of your geforce… and like that you can get very very very very fast stuff… means with simple opengl with glBegin/glEnd compared with optimiced VAR i have a boost bout 10/20 times… and with RC’s i can do very much in one pass… complete per pixel diffuse and specular shading, thats great, too… with the higherlevel apis, you get the possibility to do it on every ( nearly every ) gpu and like that you can do nice stuff for everyone, but the lowlevel let you do much nicer stuff for a specific board… i would like if there are lowlevel-extensions for ati, too… and nvidia could do sometimes highlevelapies itselfs, too… cause sometimes its simple terrible sitting here reading specs and dont understanding any word

i think both do a great job, this CopyTexSubImage - “bug” is bad, somehow… i think this is a big fault on the driver on the ati…

but the demo looks pretty cool, gave me a little feeling of gf3-dot-3-reflect-stuff in the texture-shader… just a bit, but enough to like this type of reflections

I spoke with the driver team about this. Currently the Radeon drivers only accelerate glCopyTexSubImage if the copy is from a pBuffer and you are using ARB_make_current_read.

Good to know, Dave, but doesn’t sound too good. It means that I’ll have to use pbuffers if I want to get decent performance on the Radeon. Or maybe I’m wrong in assuming that other cards that don’t support pbuffers do accelerate this code path. How does the Rage 128 driver handle this? Does it have any acceleration for this function?

I’m not 100% sure but I think pBuffers would be (at least slightly) faster than using the framebuffer for all cards that support it. I’d use pBuffers if it’s supported and fall back to framebuffer if it isn’t.