Display lists in 3.1

scratt · June 7, 2009, 8:50pm

Are you being deliberately dense?

If you read my posts from earlier in this very thread, or have any kind of OpenGL experience, you’ll be aware of the various Triangle Strip, Fan etc. configurations & also Indexing. Both offer faster ways to put text, or any kind of “QUAD” on screen than GL_QUAD. Particularly wrt to long lines of text those two methods offer a better alternative than QUADS, and always have done.

Using indexing, or stripping you can send the same 50 characters that a QUAD system would send to the screen with half (or less) the vertices. On any HW that is going to be faster.

The point is you don’t need to send even 4 vertices per QUAD if you use methods like Strips and Fans. So both your and Mark’s understanding of these formats seems to be lacking, either deliberately or through pure ignorance.

There is a reason why in Geometry Shaders you don’t have QUADS either btw.
There is also a reason why benchmarking is done using TRIs as well.
Think about it for a second.

If you delve a little deeper then you might even discover Geometry Shaders and realize that you could even do text with a single vertex. But that’s obviously not available on all HW.

Getting rid of QUADS might be unpopular, but is is a good thing in the long run, as it makes you look at what the GPU is meant to work with as a raw format, and pushes you in a direction which is going to give you more efficient results.

Are you really going to take your lead from someone who recommends using the old glBitMap commands!?!?

RTFM!

Alfonse_Reinheart · June 7, 2009, 10:11pm

Using indexing, or stripping you can send the same 50 characters that a QUAD system would send to the screen with half (or less) the vertices. On any HW that is going to be faster.

I understand now. You are speaking under the belief that all fonts are fixed-width. You assume that a run of text will result in a sequence of quadrilaterals that all share edges with one another, with no breaks inbetween glyphs.

This is not the case.

Any glyphs generated from variable-width fonts will not share edges. Ligatures, kerning, and other text formatting effects will see to that. Glyphs can overlap or have large gaps between them. This is designed into the font and must be respected for best visual results.

If you limit yourself to fixed-width fonts of Latin-1 languages, then your statement may be correct.

If you delve a little deeper then you might even discover Geometry Shaders and realize that you could even do text with a single vertex. But that’s obviously not available on all HW.

The key question in this discussion was performance. Geometry shaders use performance. The number of vertices would would decrease only at the expense of increasing the overall render time.

Are you really going to take your lead from someone who recommends using the old glBitMap commands!?!?

This part of the discussion is not about trusting Mark Kilgard or you. It is about what is correct. Just as Mark is incorrect about using glBitmap, you are incorrect about triangle strips being more effective in text rendering.

scratt · June 7, 2009, 10:22pm

[quote=Alfonse Reinheart]

I understand now. You are speaking under the belief that all fonts are fixed-width. You assume that a run of text will result in a sequence of quadrilaterals that all share edges with one another, with no breaks inbetween glyphs.

Nope. Not at all.

One single instance of a QUAD drawn as a TRIANGLE_STRIP is exactly the same number of vertices as a QUAD. Period.

There is a direct comparison for you. None of your qualifications.

Not a good real world example, but nonetheless the same.

If you want to do varying widths then indexing, and many other methods are available. Degenerate triangles and so on… You just have to be familiar with them.
And in most, if not all, there are ways to solve all the problems you pose, and save data transfer / processing power.

I have mentioned all these methods to you already at various places and you choose to latch onto the strip example as the one and only case. That is as blinkered as believing that QUADS are the panacea to all ills. They are not, and can be easily and exactly replicated using the same or even less vertexes.

AFAIK that is exactly what the hardware / drivers have been doing behind the scenes for a long time. Only now it is exposed to us and we are asked to take some time to think about it on our end, and use more native formats for our geometry.
A good idea IMO.

Now back to your Geometry shader comment.
The whole point of sending a single vertex and then making the geometry on the GPU is exactly to limit the client-server bandwidth usage. Something that your complaint and Mark’s was specifically about as you are talking about number of vertices sent.

As for the allocation or usage of horsepower on the CPU end or GPU end, what do you think happens on the GPU end when you submit a QUAD?

Eric · June 8, 2009, 12:25am

Someone sent an alert to the moderating team about Mark’s post, on the assumption that the poster wasn’t really Mark Kilgard. Just to make things clear: as Alfonse Reinheart pointed out earlier in this topic, all the evidence shows that the post comes from the real Mark Kilgard.

Alfonse_Reinheart · June 8, 2009, 1:50am

Now, I really understand. You think that GL_QUADS is not directly supported by hardware. That every draw call with GL_QUADS causes a software routine to run that converts the data into GL_TRIANGLES or some other format.

You are mistaken.

The piece of hardware that reads from the post-T&L cache understands GL_QUADS. It therefore reads 3 vertices from the cache, outputs a triangle made of them, reads a fourth vertex, and outputs a second triangle made of 2, 3, and 4. Then it skips 4 vertices ahead and repeats.

This same piece of hardware understands GL_TRIANGLES (read three, emit triangle, repeat), GL_TRIANGLE_STRIP (read 3, emit triangle, skip 1 vert, repeat), GL_TRIANGLE_FAN, and the rest.

This hardware has existed for near on a decade. It has existed in hardware since hardware T&L came to be. Every piece of T&L-based hardware must have something that converts triangle strips and fans into a list of triangles. It is in no way difficult to add modes to this hardware that can handle quad lists.

You may ask for evidence of this. My evidence is an old extension: NV_vertex_array_range. This is an NVIDIA extension, an ancient, very low-level precursor to buffer objects.

Near the end of the specification are implementation notes for NVIDIA hardware of the time. It lists the appropriate vertex formats that are acceptable on their hardware. It lists things like vertex attribute sizes for specific attributes, particular vertex formats, and so forth. It lists specific limitations for specific hardware. The NV10, the GeForce 256, is specifically mentioned. It has a number of very strict vertex format limitations.

Nowhere on that comprehensive list of limitations will you find a prohibition on the use of GL_QUADS. If NVIDIA had implemented GL_QUADS as a software layer, reading the vertex data and turning it into GL_TRIANGLES or some other form, then they would have stated a prohibition against using GL_QUADS. This is because reading from VAR memory would be very performance damaging.

No such limitation is spoken of. This is because no such limitation exists. And the only reason for that is if the hardware natively supported GL_QUADS. I even found an old demo program I downloaded years ago for VAR. It uses GL_QUAD_STRIP exclusively.

Not everything that was deprecated by the ARB was unsupported in hardware.

Xmas · June 8, 2009, 2:59am

Is there some proof that all OpenGL implementations efficiently support GL_QUADS?

GL_QUAD_STRIP is different because it almost directly maps to GL_TRIANGLE_STRIP (except in wireframe mode, and rounded down to an even number of vertices).

Jackis · June 8, 2009, 3:13am

Look, camrades, it was a friday night, Mark was surely exhausted after a hard week… So he took some bottles of beer or smth else and left such an affective post. He’s not a robot in any, he has his points of view, so what we are discussing? As for me, he could have even written “Damn, I hate GL, it suxx, use DX, it rulezz!”, and it’s his point of view. It’s a free forum, not a Kronos corporate mailbox.
Deprecation mechanism is here, with us, and we can’t break it. So we must deal with it. Me, I’m also worrying for the drivers efficiency with 2 codepathes being supported, but anyway, it’s not my headache! It’s driver developers who must take care of it, and I believe NV or ATI guys would do their best.
About quadstrips vs tristrips - mine tests year ago showed me, that I had NO perfomance difference using this or that. Test was rather good for me - big rectangular grids of 255*255 vertices. And I had no difference.
Tristrips for me are much more evident, because I have full control about diagonal edge. Using quad - I don’t know, where it’s supposed to be. And oh, triangle is most primitive with correct atribute interpolsation. Quad is not.

dletozeun · June 8, 2009, 4:25am

Reading all these posts debating about quads, I am wondering if all that’s worth the effort, I mean, who is using quads en masse in his renderer if it is not for postprocessing things (fullscreen quads), hud or font rendering?
IMO, all this stuff is nothing in computation time terms compared to rest of the scene rendering. I think I need some more clarifications about your passion about quads.
(Please stay calm, I am not criticizing anybody, quads fans or not , just need your light).

Dark_Photon · June 8, 2009, 5:25am

Amen to that. Unless the bandwidth overhead of the extra vertex indices is the bottleneck for rendering QUADS as TRIS for someone, this whole discussion makes no sense to me. I’ve never observed index bandwidth to be the bottleneck.

I mean, who is using quads en masse in his renderer if it is not for postprocessing things (fullscreen quads), hud or font rendering?

We use 'em. Impostors, light points, etc. But for convenience, not because there’s some coveted perf advantage we know of.

I think I need some more clarifications about your passion about quads.

Yeah, I missed the whole point too. But I fear we’re getting off-topic, that being a merciless flame of Mark for stating his opinion. Which I wish would just stop.

Next thread…

Jan · June 8, 2009, 8:24am

I can see that DLs are a good idea for multithreaded drivers. However the CURRENT DLs are a pain in the ass, have ever been one and i am sooo happy, that they have finally been kicked out.

HOWEVER, now that this abomination is gone, it might be a good idea to re-invent them in a way that can be supported easily by all vendors and is guaranteed to give good performance.

There has been another thread about it, and i have already given a few ideas, but now is the time for NV, ATI and the ARB to decide how to continue with this.

Oh, and thanks to nVidia and ARB_compatibility we have actually entirely “broken” GL 3 implementations today. nVidia might have the best OpenGL implementation, but they are absolutely reckless and don’t care much about the future of OpenGL as long as they are “the best” (well not as bad as ATI and Intel…) today.

And i think that Marks comment shows nVidias view on OpenGL 3 pretty well. Even if it is his personal view, it is definitely influenced by the general opinion at nVidia.

Jan.

Brolingstanz · June 8, 2009, 9:35am

If I’m NV I’m probably thinking

If it ain’t broke, don’t fix it!
Why rock the boat?
Why make things easier for the competition?
What’s to be gained by streamlining the API in areas where we have a considerable investment, proven track record of stability and speed, and the customers to prove it?
Why throw the baby out with the bathwater?
Where’s the real beef?
Demonstrated and continued leadership in propelling the API forward virtually unfettered by concerns over, say, “QUADS”.

Developers are always on the lookout for ways to make their lives easier down the road, but we all know full well that’s not what makes the (fiscal) world go 'round.

Back to tendin’ my biscuits…

P.S. Sorry for the OT but I can’t resist playing devil’s advocate.

scratt · June 8, 2009, 10:44am

Good Grief.

Nope I don’t think that QUADS drop you off the fast path.

We could go on all day… or all weekend and into the week. Oh wait, we did!
I do appreciate your comments, and those of others.

Alfonse_Reinheart · June 8, 2009, 10:44am

Oh, and thanks to nVidia and ARB_compatibility we have actually entirely “broken” GL 3 implementations today.

In what way does ARB_compatibility break NVIDIA’s GL 3 implementation?

If I’m NV I’m probably thinking

No, what NVIDIA wants is NvidiaGL.

The best example of this is the bindless graphics extension. The claim is made that bindless graphics is the best way to achieve fast performance on NVIDIA hardware.

If NVIDIA’s goal is to subvert OpenGL and convert it into NvidiaGL, what better way than with bindless graphics? To use it, you have to write special shaders that are entirely incompatible with regular shaders. To use it, you have to write rendering code that is entirely incompatible with previous rendering code. To support non-bindless and bindless in a single application, you must write and maintain 2 copies of all vertex shaders.

Adding to that is the issue of Vertex Array Objects and Uniform Buffer Objects. These should provide fast, efficient ways to change state and render primitives. Yes, bindless should be faster, but it should not be 7x faster. With NVIDIA emphasizing bindless graphics, what incentive does NVIDIA have to optimize these codepaths? All they need to do is create a self-fulfilling prophecy, that bindless graphics is much faster than the alternatives.

Id Software can afford to maintain 2 shader stacks. As can Blizzard and many other large developers. What do the rest of us do? Accept the needlessly 7x slower VAO and UBO method?

Nope I don’t think that QUADS drop you off the fast path.

Then how can QUADS possibly be slower than indexed triangle strips? Indexed strips send far more vertex data; they have to send indicies. Font rendering does not have the vertex sharing that indexed strip rendering needs to be more efficient. Every letter would need a degenerate strip connecting it to the next. So every letter would tale 4 indicies for the letter, and 4 indices for the degenerate strips connecting them.

knackered · June 8, 2009, 11:32am

This is a bit surreal.
I like the cut of mark’s gib, but I don’t agree with most of what he’s said. Should have been a clean break. Feature freeze old GL, introduce new API - this hybrid nonsense is bad. I agree with the stuff he said about dlists, but only for geometry, and only as a means of letting the driver put the geometry into optimal format. If I want that done in another thread, it should be up to me - not the driver. I’m best placed to decide how to use my CPU’s, the driver should stick to optimising stuff for the GPU.

Mark_Kilgard · June 8, 2009, 11:59am

> If you are really the Mark Kilgard, I have to say, I’m rather shocked by your suggestions. In one of your recent postings, you said that “The Beast has now 666 entry points”. Do you really believe that a 666 (and growing!) of functions API is easier to maintain and extend than a more lightweight one?

I think (scratch that), I know the size of the API (whether 20, 666, or 2000) commands has little to do with how easy it is to maintain and extend a 3D API. Does that shock you? It might; I’ve worked on OpenGL for 18 years so I approach your question with a good deal of accumulated experience and even, dare I say, expertise on the subject.

I don’t think API entry point count has much, if really anything, to do with maintainability of an OpenGL implementation. It has far more to do with 1) investment in detailed regression testing, 2) hiring, retaining, and encouraging first-rate 3D architects, hardware designers, and driver engineers, 3) clean, well-written specifications, and 4) a profitable business enterprise that can sustain the prior three.

Those are the key four factors. I could probably list more if you forced me to do so, but those are really the four key factors. If you forced me to list 20 more, I’m confident API size would still not make my list.

> nVidia and ATI are maybe the most important contributors to GL3.0+. If you seriously doubt that removing DLs and GL_QUADS is a bad thing, why haven’t you prevented it back then?

I thought it was a poor course of action then; I think it’s a poor course of action now. I’ve done my best to prevent deprecation from hurting the OpenGL ecosystem. Deprecation exists, but I consider it to be basically a side-show.

NVIDIA doesn’t remove and won’t remove GL_QUADS or GL_QUAD_STRIP or display lists (or any of the so-called deprecated features). These features all just work. Obviously our underlying GPU hardware does (and will always) support quads, etc.

Now if YOU want to avoid these features because YOU think (or someone else has convinced you) these fully operational features are icky or non-modern, go ahead and don’t use them. But nobody has to stop using them, particularly if they find them useful/fast/efficient/convenient or simply already implemented in their existing code base. NVIDIA intends to keep all these features useful, fast, efficient, convenient, and working.

The problem is that someone’s judgment (be they app developer, driver implementer, or whatever) of what is good and bad in the API probably doesn’t match the judgment of others. My years of experience inform me that people tend to consider features they personally don’t happen to use as “non-essential” and ready fodder for deprecation. The fact that other OpenGL users may consider these same features totally essential and have built substantial chunks of their application around the particular feature you consider non-essential probably doesn’t matter much to you; I assure you the person or organization or industry relying on said feature feels differently.

What you might not appreciate (though I do!) is that this unspecified “other user” may be the one that does far more than you to sustain the business model that supports OpenGL’s continued development. CAD vendors used to say (this is less so now) they didn’t care about texture mapping; game developers would say they don’t care about line stipple or display lists.

For good reason, the marketplace doesn’t really let you buy a “CAD GPU design” or “volume rendering GPU design” or “game GPU design” tailored just for CAD, volume rendering, or gaming; the same efficient, inexpensive GPU design can do ALL these things (and more!) and there’s no specialized GPU design on the market that can do CAD (or volume rendering or gaming) better than the general-purpose GPU design.

That said, a particular product line (such as Quadro for CAD) can be and is tailored for the demands of high-end CAD and content creation, but the 3D API rendering feature set (what is actually supported by OpenGL) is the SAME for a GeForce intended for consumer applications and gaming. In the same way, when GeForce products are tailored for over-clocking and awesome multi-GPU configurations, that’s simply tailoring the product for gaming enthusiasts. This is much the same way there’s not a CPU instruction set for web browsing and different instruction set for accounting.

There’s a fallacy that if somehow the GPU stopped doing texture mapping well it would run CAD applications better; or if the GPU stopped doing line stipple (or quads or display lists), the GPU would magically play games faster. In isolation, the cost of any of the features is pretty negligible and certainly the subtraction of a feature won’t improve another different feature. There’s also been repeated examples of “unexpected providence” in the OpenGL API where a feature such as stencil testing, designed originally for CAD applications to use for constructive solid geometry and interference detection, get used to generate shadows in a game such as Doom 3 or Chronicles of Riddick

Said another way, if I concentrated on just the features of OpenGL YOU care about, I would likely NOT have a viable technical/economic model to sustain OpenGL. It’s probably also true that if I just concentrated on the features of unspecified “other user” of OpenGL, I would also NOT have a viable technical/economic model to sustain OpenGL. But in combination, the multitude of features, performance, and capacity requirements of the sum total of 3D application development create a value-creating economic environment that sustains OpenGL in a way that benefits all parties involved.

Knowing this to be true, how do you expect that “zero’ing out” features by deprecation is going to suddenly make other features better or faster. There’s a knee-jerk answer: duh, well, if company Z doesn’t have to work on feature A anymore, they will finally have the time/resources to properly implement feature B.

But that doesn’t hold up to scrutiny. Almost all of the features listed for deprecation have been in OpenGL since OpenGL 1.0 (1992). If the features were simple enough to implement in hardware 17 years ago and now you have over 200x more transistors for graphics than back then, was it really the complexity of some feature that has saddled copmany Z’s OpenGL implementation for all these years? Give me a break.

Moreover, feature A and feature B are very likely completely independent features with almost nothing to do with each other. Then you can’t claim feature A is making feature B hard to implement.

> Existing (old) APIs can use the old OpenGL features. But you should not encourage people to use these old OpenGL features in their new, yet to be created APIs and applications.

I encourage anyone using OpenGL to use any feature within the API, old or new, that meets their needs.

If you think I’m going to be going around telling NVIDIA’s partners and customers (or anyone using OpenGL) what features of OpenGL they should not be using, you are sadly mistaken.

Developers are free to use old and new features of OpenGL and they should rightfully be able to expect the features to interact correctly, operate efficiently, and perform robustly. Why would I (or they) want anything less than that?

I think it is wholly unreasonable to tell developer A that in order to use new feature Z, developer A is going to have to stop using old features B, C, D, E, F, G, H, I, J, K… (the list of deprecated feature is long) that have nothing to do with feature Z.

This isn’t to say that I want OpenGL to be stagnant. Far from it, I’ve worked hard to modernize OpenGL for the last decade. I wrote and implemented the first specification for highly configurable fragment shading (register combiners), specified the new texture targets for cube mapping, specified the first programmable extension for vertex processing using a textual shader representation, played an early role (and continue to do so) developing a portable, high-level C-like language (Cg) for shaders, specified and implemented support for rectangle and non-power-of-two textures, implemented the driver-side support for GLSL and OpenGL 2.0 API for NVIDIA, and more recently worked to eliminate annoying selectors from the OpenGL API with the EXT_direct_state_access extension. Before any of this, I wrote GLUT to help popularize OpenGL.

All in all, I’m pretty committed to OpenGL’s success. If I thought deprecation would make OpenGL more successful, I’d be all for it (but that’s entirely NOT the case). Instead, I think deprecation is on-balance bad for or, at best, irrelevant to OpenGL’s future development and success.

I’m really proud of what our industry (and the participants on opengl.org specifically) have managed to create with OpenGL. Arguably, source code implementing 3D graphics is MORE portable across varied computing platforms than code to implement user interfaces, 2D graphics, or any other type of digital media processing. That’s amazing.

But deprecation in OpenGL is an unfortunate side-show. It’s a distraction. It gives other OpenGL implementers an excuse for foisting poorly performing and buggy OpenGL implementations on the industry; they (wrongly) get off lightly from you developers by employing a “blame the API” strategy that places the costs of deprecation wholly on YOU rather than them just properly designing, implementing, and testing good OpenGL implementations.

Deprecation asks You All (the sum total of OpenGL developers out there) to solve Their Problem which is they refuse to devote the time and engineering resources to robustly implement OpenGL properly; instead, they blame the API and hope You All will re-code All Your applications to avoid the simpler solution of Them simply properly implementing their own OpenGL implementation.

Trust me, API size is NOT at the core of why these problem implementations are poor (go back to the four factors I listed earlier…). Attempts to “blame the API” for what are clearly faults in their implementation doesn’t fix any root causes.

As an OpenGL developer, rather than poorly utilizing your time trying to convert your code to avoid deprecated features, you would be better served sending a loud-and-clear message that you expect OpenGL to be implemented fully, efficiently, and robustly.

Mark

knackered · June 8, 2009, 12:30pm

that’s all very well and good, but We don’t have the market share to influence support of a minority API. We either have to continue to work around Their awful implementations of a undeniably complicated API, or We move to D3d if We have the choice (which I don’t). Thanks for your understanding, Mark. How did you get so much ivory to make that tower?

Alfonse_Reinheart · June 8, 2009, 12:58pm

I don’t think API entry point count has much, if really anything, to do with maintainability of an OpenGL implementation. It has far more to do with 1) investment in detailed regression testing, 2) hiring, retaining, and encouraging first-rate 3D architects, hardware designers, and driver engineers, 3) clean, well-written specifications, and 4) a profitable business enterprise that can sustain the prior three.

It is a truism of software engineering that the larger your codebase is, the more people and effort you need to maintain and extend it. The larger your codebase, the more investment in detailed regression testing you need. The more hiring, retaining, and encouraging first-rate 3D architects, hardware designers, and driver engineers you need. And so on.

In short, the larger the codebase that an OpenGL implementation requires, the more money it takes to build and maintain it.

This is the reason why Intel’s Direct3D drivers are pretty decent, while their OpenGL drivers are terrible. OpenGL implementations simply require more effort, and Intel does not see any profit in expending that effort.

Therefore, if OpenGL implementations required smaller codebases, then the meager effort that Intel already expends might be sufficient to create a solid GL implementation. That is the goal.

Maybe this will fail to achieve the goal. But it is certainly true that doing nothing will fail to achieve the goal, as it has already failed to do so for 10 years.

they (wrongly) get off lightly from you developers by employing a “blame the API” strategy that places the costs of deprecation wholly on YOU rather than them just properly designing, implementing, and testing good OpenGL implementations.

I disagree.

As a practical, reasonable programmer, I understand that there are tradeoffs. I understand why an implementation may not bother to optimize display list calls or glBitmap calls. I do not hold this against them. They, like the rest of us, live in the real world of limited budgets and manpower. They focus on what gets the best bang for the buck.

And as a developer, I prefer having the control that a lower-level interface provides. I do not want to have to guess at what APIs work well or not. If it means more work on my part, then I accept that.

There are two solutions to the implementation problem. One solution is to force all implementations to be complete and optimized. The other is to make what they’re implementing simpler, so that the implementation can be more complete against the simpler specification.

Option one does not exist. We have tried this for 10 years, and there has been no success. Some have gotten better, this is true. But the fact remains that there are API landmines that throw you off the fast path in most implementations. These will no go away no matter what we do.

Refusing to accept deprecation and relying on ARB_compatibility will not change things. It will not give ATI or Intel added reason to spend more resources on OpenGL development.

I agree that the problems are the fault of implementers. But since they have not in 10 years fixed these problems, it is clear that they are not going to or are not able to. Therefore, it is incumbent upon us to find an alternative solution that can benefit both parties. “My way or the highway” doesn’t work; compromises may.

Even if it means I have to convert GL_QUADS to GL_TRIANGLES for no real reason, I am willing to do so as my part of the compromise position. I expect better performing and better conforming implementations from those who have been deficient in return.

Zengar · June 8, 2009, 1:56pm

I have contacted Mark and can confirm that this is indeed his account. So I would like everyone to withheld any speculations around his identity in the future. Thanks

dletozeun · June 8, 2009, 4:00pm

I think that the last Mark Kilgard’s post ends pretty well this “dialogue of deafs” (if I can say).

dukey · June 8, 2009, 5:00pm

The easiest way to draw font stuff is using wglUseFontBitmaps under windows. Which for opengl 3.1 we can no longer use … I’ve tried a lot of other methods, they all rely on 3rd party software, most of which have no support for unicode, which is a giant fail. Is it so hard to get font rendering with decent anti aliasing built into opengl ? :eek:

I think the font issue alone will be enough to scare CAD developers completely away from opengl 3.1.