display list performance on radeon

I recently purchased a radeon graphics card and experienced it to be extremely slow with the program I am working on… in a certain “boring OpenGL 1.1 mode”, on my gf fx 5700 it runs with about 400 fps, while on the radeon (9600 pro), it renders 15-30 fps. The program makes heavy use of display lists. I remember reading that display lists rather slow things down on ati hardware… is this true? Slow down to such an extent, whilst the chipset itself rather shoud be faster?

Jan

In Homeworld2 we found display lists on all ATI hardware to be slower than vertex arrays. nVidia, Matrox, and Intel all have good display list implementations with their latest drivers.

If you want fast performance on ATI hardware you’ll have to use VBOs until they optimize this part of their driver. If you stick to OpenGL 1.1 then I guess you just won’t get good performance on ATI.

Use vertex arrays instead of display lists for a bit of a performance increase with current ATI drivers.

you are talking about “a bit” performance increase… but performance is about 10-20 times slower than on gf fx with display lists. when using ARB_vertex_program and ARB_fragment_program on the ati, the situation does not change, it’s still the same speed, whilst on nvidia it slows down to about 60 fps… making it stil about three times faster than on ati. So can the problem really be the display lists? If it’s not a bit slower but that much?

thanks
Jan

If your scene is only rendered using display lists, AND if you do not have implemented frustum culling in your program, then what you might see is NVidia drivers doing frustum culling on the display lists.

Your radeon is fast. But in my experience, ATI cards are much more sensitive to “doing the correct thing” ™ than NVidia. A small mistake, an invalid vertex format, or a bad combination of states enabled, all of that can make your performance suffer a lot.

Y.

FYI: I couldn’t find the old HW2 numbers so I did some quick tests now with the 4.3 cats and a 9700Pro. Display lists weren’t slower than vertex arrays like I previously said (although we had found this with old drivers). VBOs beat both methods.

Ysaneya’s post says my thoughts too.

Originally posted by JanHH:
in a certain “boring OpenGL 1.1 mode”
Care to explain?
Are you talking about TexImage calls in display lists? I think you shouldn’t do that.

no teximage calls inside of display lists, all that happens there is:

  • tex coords
  • normals
  • vertex
  • enable/disable texture for certain tex units
  • bind texture
  • change color

and yes the program does frustum culling itself.

Is there anything that has to be avoided on ati that works fine on nvidia?

“boring OpenGL 1.1 mode” means basically that it uses standard OpenGL lighting (and multitexturing, so it’s rather OpenGL 1.2 mode :wink: ), no vertex program and no fragment program, but I reworked it to use them for bump mapping. But it still has the old mode available, and on ati its both very very slow, while on NV, old mode = 200 fps and new mode = 60 fps.

I really have no idea what’s going wrong.

well, thanks for all your help :slight_smile:

Originally posted by JanHH:
[b]no teximage calls inside of display lists, all that happens there is:

  • tex coords
  • normals
  • vertex[/b]
    Same number each?
    I seem to remember that knackered, IIRC, had some strange issues with display lists when he tried

[b]- enable/disable texture for certain tex units

  • bind texture[/b]
    Try not to :slight_smile:
  • change color
    Should be okay if you do it only a few times. If you do it very frequently, see above.

Is there anything that has to be avoided on ati that works fine on nvidia?
No idea. However, I can tell you what runs well for me (and has done so for years) with ATI drivers.

[ul][li]dlists containing texture environment state changes[/li]glActiveTextureARB, glTexEnvi[li]dlists containing pure geometry, all used attributes set for every vertex[]dlists that start with a glColor call and contain geometry, but with no further glColor calls[/ul][/li]I actually never tried mixing state (other than vertex attributes) and geometry in a display list, I just never had a usage model for that.
Maybe you should just split into multiple display lists, so that they are all “pure”.

yes it’s in fact planned to use a separate display list for every material, so that color and texture changes do not appeare inside of them. however, I cannot see anything logical about texture and color changes being faster outside of display lists than inside. and after all, there are only very few color and texture binding changes at all, so this cannot be the problem. and remember, it is not a subtle thing but it is 400 fps on nvidia vs. 20 fps on ati. so there MUST be something going entirely wrong, I think.

hmm… how exactly is your data looking? the only time i had really extreme slowdowns (100th of vertex arrays) was with vbo and unaligned data. maybe there are similiar restrictions for display lists so with the advice above you could try:

specify normals etc. (except color maybe) for every vertex. use floats (or for colors if you use byte, use all 4 components). i tend to think ati loves to sacrifice flexibility for speed.

vertex, color and normal data is floats, and specified every vertex.

the data exists in two flavours, it’s a ground mesh which is segmented, one version has 6x6(=36) segments, the other one 20x20(=400) segments, both are the same otherwise, same amount of faces, same look, and both run at the same (slow) speed.

I just discovered when using vertex and fragment program, it slows down even more, from 20 down to 14 fps, and all the fragment program does is fetching the texel color and writing it to result.color.

So there MUST be something wrong… do you at least agree with me on ths?

Jan

Colors are floats ? This should ring your alarm. That’s not a “standard” vertex format, you should fix this anyway.

How many polygons per segment?

If you see no difference at all between 36 and 400 segments, this should ring your alarm a second time. The problem is likely not a vertex format/transfer/bandwidth problem. I’m guessing you don’t see a difference if you use vertex arrays or even immediate mode, do you?

Y.

I think floats as colors are considered standard for ATI as well ( i only use that, and i dont have any speed problems)

Have you tried to just use VBO? just so see if you get the same strange results

216618 vertices at all, whichs makes an average of

6107 vertices in the 6x6 segment version and

542 vertices in the 20x20 segment version.

No I did not try to change this to immediate mode or vertex arrays/vbo because this would be far too much work, and the program running on ati is “nice to have” but in general we are content with nvidia. So it’s rather my personal interest (and at least I spent EUR 139,- on the radeon :wink: ) than a neccessity to get it running on ati hardware.

I was also surprised at ATI’s performance in “dumb” OpenGL modes (no shaders, …) when I first popped in a 9800XT into our renderer. The GeforceFX 5950 proved to be faster by factors of 3-7, depending on the draw mode.

I didn’t have a VBO implementation available, but for all other stuff, ATI was extremely sensitive to the way you send you geometry (triangle strips or not, indexed geometry or not etc.), while the Geforce provided excellent performance all across the board.

And I wasn’t even able to compare display list performance. The ATI chucked out due to “out of memory” for a 1 million polygon mesh, whereas the Geforce was able to render 2 million polygons with display lists just beautifully.

I assume that with VBO, you can get the same level of performance on the ATI, but I have to say that for normal “dumb” OpenGL operation, NVIDIA’s drivers are much better…

Michael

This does not sound very positive for ati… if the radeon chip is that sensitive to all kinds of things and geforce does a much better job all in all, it’s the exact oppositve of the ati hype in game magazines, where you get the impression that geforce fx is really a “looser” chipset and ati is much better. But it seems that the radeon is a taylor-made direct3d chip and is not that convincing in other situations, whilst the geforce fx is not as bad as those game magazines say.

Think of it like this. ATi’s first priority over the last year is to stabilize their drivers. So, where do you put your optimizations given limitted development time? On code paths that are already in use, or are likely to be used in the future. VBO, VAO (when it mattered), vertex arrays for formats you support. So what if your card doesn’t handle display lists as fast as it could? That doesn’t matter because DL’s aren’t in wide use among actual professional developers. Not to say that some don’t use them, but those who don’t outnumber those who do. Greatest good for the greatest number.

So, they optimize what makes actual functioning programs of importance run fast, as well as the API’s that programs expect to use in the future (VBO).

It isn’t that the Radeon is a D3D chip, as it can beat nVidia chips on GL programs as well. It’s just a matter of GL having numerous data transfer API’s to optimize, and D3D only having 1. So, you pick the ones that you need. And Radeons still handily own NV3x’s at floating-point fragment shaders. So if you’re doing something advanced, you’ll still want your Radeon.

Basically, the R3xx’s, and their drivers, are made for actual game developers and gamers first, and everyone else second.

So, if you find the sub-optimal path for sending vertices to be slow on your Radeons, stop using the sub-optimal path, and start using something real.

Originally posted by JanHH:
This does not sound very positive for ati… if the radeon chip is that sensitive to all kinds of things and geforce does a much better job all in all, it’s the exact oppositve of the ati hype in game magazines, where you get the impression that geforce fx is really a “looser” chipset and ati is much better.
ati is fast, but it seems many optimizations include a kind of “do it our way or live with the consequences”-policy. a few “right” ways that work well and many “wrong” ways that kill performance (and in some occasions crash your system).

But it seems that the radeon is a taylor-made direct3d chip and is not that convincing in other situations, whilst the geforce fx is not as bad as those game magazines say.
i dont know if the whole hardware is “aimed” at d3d, but at least point sprites and a few other issues make me feel that d3d comes first when they write drivers. mentioned point sprites even make me wonder if they screw up opengl support on purpose.

seems like a draw: more performance and less flexibility or less performance and less touchy.

Well, I’ll stick with nvidia then, I am quite content with nv performance when using NV_vertex/fragment_program instead of arb. And isn’t it that nvidia IS doing a better job with their drivers, if they support any path, whilst ati only supports “their” path?

I disagree with “more flexible, less speed (nvidia) vs. less flexible but more speed (ati)”. I don’t think that one preferred path which is meant to be fastest has to become slower if other paths are supported as well. It’s just more work for the driver writers, I guess :wink: . Or maybe also for the hardware designers to create a chipset that is more flexible.

After all, I think this s*cks. sorry… but one could think if what ati does is a “certified” OpenGL implementation at all, if it goes like, “hey, you can use display lists also but it will hardly be faster than simply drawing it with a software renderer” (which nearly seems to be the case). Maybe consisten speed throughout most of the implementation should be a criteria, too, as well as support of all features.

Jan

And isn’t it that nvidia IS doing a better job with their drivers, if they support any path, whilst ati only supports “their” path?

And you know that clipping planes are done in software on NVidia cards, while they are in hardware on ATIs ? Not to start a flamewar, but you can find tons of differences between these two families of video cards, and after having worked for quite a while with both, it is honnestly my opinion that there is no black or white; none is superior to the other. If you’re a serious developer you’ll have to live with it and learn the differences…

Y.