OpenGL and depressing deprecation!

AstroM · June 1, 2014, 8:56am

Hey!

Been a while since I was here and I dont know were to post this, but I feel I must ventilate my thoughts about OpenGL somewhere.
After completing several OpenGL projects, including one commercial, with quite success and ease Im becoming more and more worried about the future of OpenGL.

Since Im upgrading graphics and things in a current project I have experimenting with shadow volumes, GLSL and all that stuff quite a lot, and I must say I found the way to mix good old OpenGL with some shaders is the far easiest and fastest way to get something drawn on the screen.

Just started with geometry shaders and it works great with the old deprecated functions. I found that in my biggest project about 99.9% of the code uses deprecated functions:whistle:.

Since Im in the middle of a general upgrade of a project Im getting a bit concerned if I should start all over.
The next issue is that if I start all over Im not sure if I should stick to OpenGL.
Dont get me wrong here, I love OpenGL. In its current form. I started with DirectX5 about 100 years ago and after struggling for a year it took almost a week to accomplish the same in OpenGL with better results and speed.
So in my opinion the reason for its widespread is that its so easy to get the hang of. Latest versions actually deprecate almost EVERYTHING that makes GL so easy to use.

I know I will get a lot of arguments about all the “advanced” users that get the hang of all maths behind 3D graphics, not that Im a total isiot myself. But to me it sounds like 3D programming in OpenGL is getting more and more away from the independent users with the cool ideas and more towards the “proffessionals” that know how to code but not what to code.
It could be a dangerous future if OpenGL gets to difficult to use. Especially when a lot of helper libraries and stuff is available for the “other” API.

I really hope the old functions will stay as they are when it comes to graphics card drivers. Its still amazing to download drivers that has the size of medium sized full games and the deprecated OpenGL part cant make up many of these MBytes anyway.

It still chocks me that display lists was deprecated. I will not be able to do without them. You can put up many arguments that VBOs are more flexible and all that. But since the possibility to use shaders got possible its almost a brand new start for display list, since you can animate primitives in displaylists using GLSL.
And I have never seen a VBO thats even close to executing faster then a displaylist.

Finally my actual question in the matter: WHEN should you actually stop using the deprecated functions.
And dont give me the standard ASAP answer:D. I hope someone with deeper insight in the Khronos group can give an estimated guess anyway.

arekkusu · June 1, 2014, 9:35am

How about: “when you want to port your code to a phone.”

mhagain · June 1, 2014, 1:55pm

Now.
Yesterday.
Tomorrow.
Next week.
Never.

These are all valid answers and just illustrate the point: if the deprecated functions continue to work well for what you’re doing, then just continue using them.

Part of the drive towards deprecation was a feeling that the OpenGL API was getting too big, too unwieldy, too complex, with 17 different ways of specifying vertex formats and 53 different ways of actually drawing them. By focussing it down to one way, in theory there is only one fast path, driver writers get to target optimizations better, programmers know exactly what they need to do to hit that fast path, and consumers get a more predictable and consistent experience. We get to wave goodbye to silly situations like: “draw it this way and it’s fast on NV but grinds to single-digit framerates on AMD, draw it the other way and it’s fast on AMD but NV goes into software emulation unless these 3 specific states are enabled, which in turn cause Intel and AMD to explode unless those other 2 states are disabled but doing that causes AMD to … f-ck it, I’ll just write eight different rendering backends and be done with it”.

In practice deprecation is not mandatory. Your current code will continue to work in future. Where you might hit trouble is if you’re using a really new feature that doesn’t define any specific interaction with the older drawing commands, so check extension specs, check which OpenGL version they’re written against, have some degree of familiarity with what drawing commands you can use with that version (you don’t need to know them in detail) and make some informed decisions based on actual facts. Above all, test on different hardware configurations so that you don’t fall into the “works on NV only” trap.

GClements · June 2, 2014, 7:11am

Unfortunately, “hello world” programs aren’t the priority. As software becomes more complex, the overhead of the “boilerplate” becomes proportionally less significant.

While the compatibility profile won’t be going away any time soon, eventually you’re likely to be forced to make a choice between the legacy API and the new features. In particular, Apple have said that they won’t be adding any new extensions to the compatibility profile. So if you want the newest features, you’ll have to use the core profile.

You can’t separate the code into the “legacy” and “new” parts. If you can use both the legacy features and the new features at the same time, then the code which implements the new features has to take account of all the legacy features. The result is that complexity grows exponentially.

It would seem that X11 (and specifically the network-transparency aspect) isn’t as important a platform as it once was. Bear in mind that the original motivation for display lists was to avoid sending the same sequence of commands over the network (which, in those days, was limited to 10 Mbit/sec) every frame. Outside of that use-case, display lists aren’t all that important.

When you no longer need to support systems which lack OpenGL 3.x.

Nikki_k · June 2, 2014, 1:30pm

To be honest, 90% of the deprecated stuff belongs to the garbage bin.

The only regrettable thing is that they threw away the immediate mode drawing commands without providing an efficient way to draw large amounts of small, dynamic low polygon batches. The vertex buffer upload times for these are a true performance killer when using glBuffer(Sub)Data so this was probably the biggest roadblock for core profile adoption by software.

Only some very recent features provide a viable alternative for this.

Granted, I occasionally miss the convenience of the built in matrices, but all the rest - that includes the entire fixed function pipeline and especially the display lists are heavy baggage that needs to be carried around by the drivers but offers very little use aside from supporting ancient hardware.

The main problem I have with deprecation is that it was removed all at once, instead of doing it gradually. The deprecation in 3.0 was a clear indicator of which functionality needed to go, but the time between deprecation and removal was just too short for removing everything in one move. No software can adopt to changes that pull away the rug under its feet in one move. The result of this ill-thought out strategy are the compatibility profiles, which probably will haunt OpenGL until all eternity.

I think it would have gone a lot smoother, if 3.x only had deprecated, and finally removed fixed function, leaving the rest for 4.x. Thanks to doing it all at the same time the likelihood of software being upgraded is a lot less than it could have been.

mhagain · June 2, 2014, 3:34pm

The problem is that buffer objects have been in core since GL1.5. They’re not anything new, yet we frequently see people making this complaint as if they were. There are plenty of intermediate porting steps available - one of them is called GL2.1 - and it’s difficult to see how adding yet another intermediate porting step could have made the situation any different. Odds are we’d still be seeing the same complaints if that had been done.

Right now you have three main options when it comes to older code:

[ol]
[li]Don’t port it at all. It will continue to work; some day you may encounter a case where deprecated functionality doesn’t coexist nicely with new higher functionality, but if you’re maintaining a legacy program you may usefully decide to just stay with GL2.1 or lower, and your program will just continue working as before. [/li][li]Bring it up to GL2.1, move it over to buffer objects only, move it over to shaders only, then make the jump to core contexts. This is what I consider the most sensible approach for those who don’t want to take option 1; you get to make the transition to buffer objects only and shaders only at a pace that suits you, then, when and only when that’s complete, you jump up and start accessing higher functionality. [/li][li]Do a full rewrite to bring it up to core contexts in one go. This is problematical for reasons outlined here and elsewhere. [/li][/ol]
Thing is, a lot of complaints about deprecation and removal of deprecated functionality are written as if options 1 and 2 did not exist.

Regarding “large amounts of small, dynamic low polygon batches”, buffer objects in GL2.1 actually do have options that will let you handle them; the real performance problem comes with a naive implementation, where you take each glBegin/glEnd pair and treat it as a separate upload/separate draw. That’s not a fault of the API, it’s a fault of how you’re using it, and arguably also a fault of the older specifications for not providing clarity on how things should be used in a performant manner. The solution of course is to batch your updates so that you’re only making a single glBufferSubData call per-frame, then fire off draw calls. Yes, that means work and restructuring of code if you’re porting from glBegin/glEnd, but that brings me back to the first point: none of this is anything new; we’ve had since GL1.5 to make the transition to buffer objects, so complaining about it now seems fairly silly.

Nikki_k · June 5, 2014, 3:18am

[QUOTE=mhagain;1259771]
[li]Bring it up to GL2.1, move it over to buffer objects only, move it over to shaders only, then make the jump to core contexts. This is what I consider the most sensible approach for those who don’t want to take option 1; you get to make the transition to buffer objects only and shaders only at a pace that suits you, then, when and only when that’s complete, you jump up and start accessing higher functionality.
[/li][/QUOTE]

I see you completely fail to see the issue here, namely that this is a non-trivial transition that may require a MASSIVE investment of time, if at all possible. Let’s make it clear: There are situations where the limitations of GL 3.x buffers will make this entirely impossible (because all of the existing buffer upload methods are too slow. Fun part: In Direct3D this was significantly less of an issue!) I have been working on a project that makes very liberal use of immediate mode drawing whenever something is available to be drawn. This project is also (quite unsurprisingly) 100% CPU-bottlenecked. But to ever make this work with buffers we have to do even more maintenance on the CPU without experiencing any benefit whatsoever from faster GPU access, just to optimize buffer upload times. In other words, up until very recently, porting was a no-go - just because of immediate mode. Getting the code away from the fixed function pipeline and the builtin matrix stack was a simple and straightforward matter by comparison - but with immediate mode we had to wait for almost 6 years until a viable replacement came along (with the emphasis being on ‘viable’!)

Apparently you don’t see the ramifications of the handling of the core profile. It was utterly shortsighted and apparently only concerned with newly developed software - but apparently no thought was wasted how the deprecation mechanism could be used to get old, existing software upgraded to newer features. The plain and simple fact is that the amount of ported software will be inversely proportional to the amount of work required. The more work is needed to port an existing piece of software, the less likely it is to be ported because if porting involves throwing away large pieces and starting from scratch the inevitable result will be your first option: Don’t port it at all!

It’s just - do the OpenGL maintainers really want that? It should have been clear from the outset that using new features has to mean playing by the new rules exclusively. Deprecation should help steer developers to leave the old behind and start using the new - so that in the future the old stuff will be gone. That means, you have to take a gradual approach. If you advance too brutally you lose touch with your developers.

And in GL 3.x land it is far easier to port old code away from the fixed function pipeline to shaders than it is to rewrite code to use buffers - in fact in many cases it’s utterly impossible because the internal organization simply doesn’t allow it.

But that’s simply not possible if such a careless approach is taken. If you want to let go of the old you have to make 200% certain that people will adopt.
That means, you deprecate stuff that’s more or less equivalently replaced by an existing modern feature - you DO NOT(!!!) deprecate (and don’t even think about removing) stuff that forces a complete application rewrite! If you can’t deprecate immediate mode without providing an equivalent replacement that’s better tied into the ‘modern’ ways you wait with deprecation!
So, at the time of 3.0, fixed function was more or less equivalently replaced by shaders, the matrix stack is a special case but since it’s barely part of the hardware state not a big deal, so deprecating both was fine.
On the other hand, if you wanted to replace code that’s inherently tied to liberally invoking draw calls via immediate mode - you alternatively do it… … … how?!? The answer is, you can’t!
It might have messed up the core profile for a few more versions but I’d guarantee you that if that had happened there would not be a compatibility profile now! I’d rather have taken such a temporary mess compared to the permanent one we have now.

And to be clear about it: Yes, immediate mode needed to go, but it was removed at the wrong time! Now, with persistently mapped buffers I suddenly can port over all my old legacy code without any hassle (as in: no need to restructure the existing logic because there is no performance penalty anymore for just putting some data into a buffer and issue a draw call) - but wait - there’s a ‘BUT’! Immediate mode has been gone from the core profile for years and there’s already drivers out there which implement only core profile for post 3.0 versions. Although this doesn’t prevent porting, it definitely makes it harder, in case some co-workers stuck with such a driver also need to work on the project. So we are now in a situation that a properly designed deprecation mechanism was supposed to avoid (as in, you always have one version at your disposal where the old feature you want to replace and the new one you want to replace it with both exist and are fully working!)

Face it: 90% of all existing legacy code is structurally incompatible with the 3.x core profile! That means, 90% of all existing legacy code is never getting ported at all to 3.x core! The reason it’s structurally incompatible is not the deprecation of the fixed function pipeline - it’s also not the deprecation of the matrix stack - it’s solely the deprecation of immediate mode rendering without having any performant means to simply replace it.
And the inevitable result of this was pressure to compromise - and behold - that compromise has become the monster called ‘compatibility mode’. Had deprecation been done properly in a way to drive developers toward updating their code instead of having a preservationist crutch hacked into the driver, things might look better now.

It doesn’t matter one bit that buffers had been core since 1.5. Before deprecation of immediate mode, developers chose between those two based on which approach was better suited to the problem at hand. Sometimes a buffer works better, but at other times it performs far worse. And it’s these ‘far worse’ situations that were inadequately dealt with in GL 3.x core.

thokra · June 5, 2014, 6:35am

Astrom: My take is simple: no legacy GL in new code.

If you’re forced to maintain a legacy code base, usually due to economical, time and compatibility constraints, by all means, keep the legacy well and clean. As mhagain already stated: there are core GL 4.4 features you can already use even in legacy code, the most prominent being plain vertex buffer objects.

I see you completely fail to see that it will be a huge or possibly massive investment of time anyway. The question is, do you invest the time in small steps, porting feature by feature, or do you go ahead and rewrite everything. Going from legacy to modern core OpenGL takes time and care - no doubt. Still, mhagain already proposed the first option - and he’s right to do so IMO.

Proof please. I’m not aware of any D3D10 feature that so massively kicks the GL’s ass. Or am I simply not aware of something similar to persistently mapped buffers in D3D10? I thought the only thing giving you an advantage over is D3D10_MAP_WRITE_NO_OVERWRITE with a D3D10_MAP_WRITE_DISCARD at frame begin.

Yeah yeah, you mentioned that already - several times - in another thread. It’s high time you tell us what frickin’ exotic scenario you’re talking about. Otherwise you’ll simply stay in that magical position that no one here can disagree with because there isn’t enough hard facts to do so. Cut the crap and get real.

Microsoft did it. Maintaining backwards compatibility for over 20 years is ludicrous for something like OpenGL. D3D10/11 doesn’t give a crap about the D3D9 API. The things is, even if you only leverage the features that comply to the D3D9 feature subset still supported by D3D11, you still have to code against the D3D11 API. You can’t even use the old D3D9 format descriptors. No way you’re gonna have a D3D11 renderer and still write stuff similar to glEnableClientState(GL_FOG_COORD_ARRAY), to mention just one example the make me want to jump out the window, while at the same time wanting to have kids with the GL 4.4 spec because of GL_ARB_buffer_storage.

And what are we gonna do anyway? Suppose there had been a compatibility break and we now were forced to either stay with GL3.0 at max OR start rewriting our code bases to use GL3.1+ core features - what would have been the alternative? Transition to D3D and a complete rewrite of everything? Also, where I work, we’re supporting Win/Linux/Mac - go to D3D and you have to write another renderer if you want to keep Linux and Mac around.

IMO, you have to make sure that people have to adopt - see D3D. That’s where the ARB failed - by letting us use the new stuff and the old crap side-by-side. I seriously doubt many companies would have been pissed off enough to leave their GL renderers behind.

And there is no substantial problem I know of that’s solvable with GL2.1 but not with GL 3.1+ - if you have one, stop rambling and prove it with an example.

Name one feature you’re missing from GL 3.1+ that forced you to rewrite your entire application. I’m very, very curious. If you’re answer is gonna be what you repeatedly mentioned, i.e. immediate mode vertex attrib submission is king and everything else is not applicable or too slow (which is a hilarious observation in itself), I refer you to my earlier proposition.

Oh, my bad, there it is again …

Liberally invoking draw calls? Since when is someone writing a real world application processing large vertex counts interested in liberally invoking draw calls? Please define liberallyand please state why you can’t batch multiple liberal draw calls into one and source the attribs from a buffer object. Otherwise, this is just as vague as everything else you stated so far to defend immediate mode attrib submission.

More than 15 years isn’t enough? Seriously?

See? That’s what I’m talking about … the code to do that, except for a few lines of code, is exactly the same. In fact, with persistent mapping, you have to do synchronization inside the draw loop yourself - a task that’s non-trivial with non-trivial applications.

Persistent mapping is an optimization and it doesn’t make rewriting your write hundreds of times easier. You, however, continue to state this perverted notion that persistently mapped buffers are the only viable remedy for something that was previously only adequately solvable with immediate mode … Have you ever had a look the the “approaching zero driver overhead” presentation of the GDC14? Did you have a look at the code sample that transformed a non-persistent mapping to a persistent mapping? Your argument before was that you cannot replace immediate mode with anything else other than persistently mapped buffers. If you’re so sure about what your saying, please explain the supposedly huge difference between an async mapping implemenation and a persistent mapping implementation - because you didn’t say that async mapping was too slow because of implicit synching inside the driver or something (and that’s AFAIK only reportedly so in case of NVIDIA drivers which really seem to hate MAP_UNSYCHRONIZED), you said you couldn’t do it at all.

Again, there is nothing of importance you can’t do with core GL 3.1+ that you can do with GL 2.1 - except for quads maybe. You have everyhing you need at your disposal to go from GL2.1 to core GL 3.0 - and everything you write then is still usable even if you then move directly to a GL 4.4 core context.

Even if it means a little more work, it’s almost definitely solvable and never a worse solution. If I’m wrong, please correct me with concrete examples.

Nice assumption. Got any proof? You are aware that you’re talking about almost every application out there using OpenGL, right? Also, a rewrite is essentially also a port - neither the existence of your renderer seizes, nor do the problems you solved before completely vanish just because you’re bumping the GL version.

You have not produced any statistics that support this statement, also no code or a high-level description of your problem at hand. You’re just rambling on and on …

Wrong again. Developers chose client side vertex arrays before VBOs because for amounts of data above a certain threshold, client side vertex arrays substantially improve transfer rates and substantially reduce draw call overhead. Plus, there is no way of rendering indexed geometry with immediate mode because you needed either an index array or, surprise, a buffer object holding indices.

Again, purely speculation - and stating the a buffer object supposedly performs better than immediate mode sometimes … that’s really something to behold. Unless the driver is heavily optimized to batch vertex attributes you submit and send the whole batch once you hit glEnd() or even uses some more refined optimizations, there is no way immediate mode submission can be faster than sourcing directly from GPU memory - not in theory and not in practice.

Name three.

mhagain · June 5, 2014, 9:45am

No.

The point is that this isn’t a GL3.x+ problem; this is a problem that goes all the way back to GL1.3 with the GL_ARB_vertex_buffer_object extension, so you’ve had more than ample time to get used to the idea of using buffer objects, and more than ample time to learn how to use them properly.

Talking about it as though it were a GL3.x+ problem and as if it were something new and horrible isn’t helping your position. Howabout you try doing something constructive like dealing with the problem instead?

And for the record, I also know and have programmed in D3D8, 9, 10 and 11, so I’m well aware of the no-overwrite/discard pattern and of the differences between it and GL buffers.

malexander · June 5, 2014, 12:06pm

As long as you’re only targeting AMD and Nvidia cards on Windows or Linux, you can use a GL4.4 compatibility context. However, if you intend to every support GL3/4 features with Windows/Intel graphics (which is becoming a larger segment) or OSX, you’ll need to use a core profile context. OSX only gives you the option of an older GL2.1-based context (with a few rather old GL3 extensions thrown in), or a pure GL3.2 core profile with either GL3.2 (10.7), GL4.1 (10.9) or GL4.4 (10.10) context.

As our application (with TONS of GL1.x code in it, on the order of 10s of thousands of lines of GL-specific code) works on OSX, we had to undertake a process of conversion to modern GL from display lists & immediate mode (in the worst case) as supporting a GL3.2 and a GL2.1 rendering backend proved to be more problematic that taking the core-profile plunge. We were also concerned that Apple might simply drop the GL2.1 profile at some point, as they’re known to do with old APIs.

Most of our GL code was for drawing 2D UI elements, but we also have an extensive 3D viewport (polys, NURBS, volumes, etc). We completely rewrote the 3D rendering code (which took quite an investment), but conversion of the 2D UI elements took significantly less time (2-3 months). The 3D conversion was done to generally improve performance and appearance, and used modern core GL3.2+ features. It was the 2D UI conversion that was done specifically because of core-GL platform issues. Someone way back had wrapped all the immediate mode and basic GL commands in our own functions (which had debugging and assertion code, etc), so we used these and replaced the underlying GL mechanism. It now streams these vertex values to VBOs, and only flushes them when the GL state changes significantly enough to warrant it (texture change, for example). After that we looked at rendering bottlenecks and converted those slow rendering paths with our pseudo-immediate mode code and hand-converted those to modern GL.

So it’s not impossible, but certainly isn’t trivial either. If you are thinking about Mac OSX as a potential platform and want to use modern GL features, you’ll be faced with this problem. Otherwise, I wouldn’t worry about the core profile at all, and instead gradually upgrade the parts of your application that will benefit from modern GL techniques (whether it be performance or new capabilities).

Nikki_k · June 11, 2014, 4:24am

malexander;1259836:

So it’s not impossible, but certainly isn’t trivial either. If you are thinking about Mac OSX as a potential platform and want to use modern GL features, you’ll be faced with this problem. Otherwise, I wouldn’t worry about the core profile at all, and instead gradually upgrade the parts of your application that will benefit from modern GL techniques (whether it be performance or new capabilities).[/QUOTE]

And behold - here lies the problem with it all! Yes, we want our code to work on MacOSX and Intel hardware but what the theoreticians completely overlook is that management also has a say in the matter, resulting in the following:

no rewrite from the ground up

no change of general program flow

no time consuming changes

Of course it’s easy to say ‘you should have done…’ and other smart-ass remarks but they always fall way off the mark of reality. That’s what some people seem to forget: The old legacy code exists, and in some form it needs to continue to exist, and worse, it needs to be kept operable on more modern systems.

So here it goes:

[QUOTE=mhagain;1259835]No.

The point is that this isn’t a GL3.x+ problem; this is a problem that goes all the way back to GL1.3 with the GL_ARB_vertex_buffer_object extension, so you’ve had more than ample time to get used to the idea of using buffer objects, and more than ample time to learn how to use them properly.

Yes, tell that to the people who made the mess more than 10 years ago. I’d fully agree that it was badly designed but that’s what I have to deal with and no considerations you make, will make the code go away.
But it’a complete bullshit anyway. glBegin/glEnd was a tried and true feature until GL 2.1 so whatever you are trying to say here goes way off the line. You are arguing from a theoretic standpoint, completely forgetting that what I have to deal with is code that actually exists and actually needs to be kept working.
Plus, the performance characteristics of both methods are so totally different that there’s simply no 1:1 transition, that’s why the old code was never changed.

Of course it’s a GL 3.x problem, that’s when the immediate mode stuff was deprecated and some driver makers decided to drop it without any equally performant feature to replace it.

And now to the other person who doesn’t seem to have a grasp on the maintenance of old legacy code…

[QUOTE=thokra;1259832]Astrom: My take is simple: no legacy GL in new code.

If you’re forced to maintain a legacy code base, usually due to economical, time and compatibility constraints, by all means, keep the legacy well and clean. As mhagain already stated: there are core GL 4.4 features you can already use even in legacy code, the most prominent being plain vertex buffer objects.
[/QUOTE]

Sorry, that doesn’t work. ‘Legacy’ doesn’t necessarily mean to keep the old feature set. What if you want to upgrade to integrate some newer shader-based features but for some reason or another cannot afford to do a complete overhaul of your code base, be it for financial or time reasons. In that case you have to find a compromise.
So far the compromise has been the compatibility profile but at my workplace everybody is in agreement that this is a stopgap measure at best, and as soon as it’s technically doable, migrate to a core profile so that we aren’t locked to AMD and NVidia on Windows.

The orders are, not to do a huge investment of time. And as I already said, mhagain’s proposal has already been nixed. It can’t be done. End of story. Too much work for no gain. We’d have to do months of work with no result in sight, that’s plain and simply not affordable.
So again, find a compromise that gets us where we want to be. (Yes, you read that correctly: The operative term is always to ‘compromise’…)

The problem seems to be that buffer mapping is a lot more efficient with D3D than with OpenGL 3.x. All I can tell you that the buffer updates were killing us with GL but not when doing a D3D test setup.

I think I said this countless times before: The code I have to deal with is sprinkled with immediate mode draw calls, one quad here, one triangle fan there, and a triangle strip elsewhere. It’s not exotic, it’s just crufty, bad old code from another time. Due to the way all of this is done it’s very hard to optimize. Since I am not allowed to disclose more information you have to trust my saying that the only way to port this to a buffer-based setup is to upload each primitive’s data separately, issue a draw call and go on. The code is inherently tied to such an approach (which was all nice and well when it was written a long time ago)

… which ultimately was the reason why we decided against porting to D3D. As soon as you need to move beyond the currently set-in-stone feature set you are screwed. A new API every 3 or 4 years is deadly if you got to work with software that may exist for a decade or more and also needs to be kept up to date to a degree - not to mention that D3D11 is restricted to Windows 7, causing problems if it needs to be accessed from an older system.

What would have happened? Easy to answer: The code would have stayed as it was, limited to GL 2.1 features with no chance of ever being upgraded, having our bosses quaking in their shoes that the old API won’t eventually vanish completely.

And here you are forgetting something:
D3D is mainly used for entertainment software, which MUST be current with actual technology. The 5 year old D3D9 engine won’t make do anymore for a new product.
The same is not true for corporate software, which is often badly maintained, full of ancient cruft and something a company’s well-being relies on.
It’s absolutely unfeasible to go at this with the ‘out with the old - in with the new’ approach, management would balk at this. Again, the nice word ‘compromise’ must be mentioned. And it’s clearly here where the compatibility profile comes in: Lots of high profile customers who simply cannot afford to port their software to an entirely different paradigm of working. The mere fact that a compatibility profile had to be established was a clear indicator that something was wrong with how the deprecation mechanism was used.

It’s not about the inability to solve a problem but about the inability to redesign an existing solution without blowing it up. Face it, GL 3.x was completely missing an efficient method to do small and frequent buffer updates, resulting in horrendous CPU-side primitive caching schemes and similar crutches to reduce the amount of buffer uploads. I have written my share of those myself for other projects, all this did was cost a lot of time, while providing absolutely no performance increase over using immediate mode.
And frankly, this particular thing was the ONLY thing that was sorely missing from GL 3.x

See above: The inability to just put some data into a buffer without some insane driver overhead. Yes, just an efficient method to replace immediate mode draw calls. You may ignore this problem as much as you like, that doesn’t change anything about the cold hard fact that our ‘big app’'s life depends on it.

Again: The code exists, the code needs to continue to exist, it’s one of the backbones of our company that this application continues working.
Again: It’s very old, it’s very crufty and today would be written in a different way.
Again: All of this doesn’t eliminate the fact that I have to deal with the code as it was written more than a decade ago and liberally expanded over the years.

It’s a simple question of economics - a rewrite would be too costly. There’s no point to discuss this. The decision has been made and I have to deal with this and make do with what I can do - which is merely picking out the immediate mode draw calls and replace them with anything that’s compatible with a core profile and doesn’t bog down performance.

You cannot pull away the rug under some existing software in the vain hope that everyone can afford to take the time to reorganize all the data.

Huh? The point of persistent, coherent buffers was precisely to AVOID such schemes! Just write some data into a buffer, issue a draw call and go on, allowing perfect 1:1-translation of existing immediate mode code without any need of restructuring and none of the overhead from the inefficient way to specify vertex data in immediate mode.

thokra;1259832:

Persistent mapping is an optimization and it doesn’t make rewriting your write hundreds of times easier. You, however, continue to state this perverted notion that persistently mapped buffers are the only viable remedy for something that was previously only adequately solvable with immediate mode … Have you ever had a look the the “approaching zero driver overhead” presentation of the GDC14? Did you have a look at the code sample that transformed a non-persistent mapping to a persistent mapping? Your argument before was that you cannot replace immediate mode with anything else other than persistently mapped buffers. If you’re so sure about what your saying, please explain the supposedly huge difference between an async mapping implemenation and a persistent mapping implementation - because you didn’t say that async mapping was too slow because of implicit synching inside the driver or something (and that’s AFAIK only reportedly so in case of NVIDIA drivers which really seem to hate MAP_UNSYCHRONIZED), you said you couldn’t do it at all.

The problem with a non-persistent mapping (using glMapBufferRange) is that each time I want to write data to the buffer is to lock the buffer, write some data into it, unlock it again, and issue a draw call (since a draw call may not source from a mapped buffer.) And that process is SLOW!!! Sure it’s doable but it’s far from performant, it was significantly slower than using immediate mode, to the point where it bogged down the app. Same for updating with glBuffer(Sub)Data. From day one of working with a core profile, my one and only gripe has been that a low-overhead buffer update mechanism had completely been overlooked, it was all geared toward having large static buffers while forgetting that not everything is large and static and not all code is easily rewritten to keep data large and static.

That’s the main reason I jumped for persistent, coherent buffers, with those the code is actually FASTER than immediate mode, even on NVidia where glBegin/glEnd still works fast.

Aside from performance in some border cases, one of which our app unfortunately depends on, sure, you can do everything with GL 3.x core. (And from what I learned all these border cases stem from the convenience of using immediate mode drawing just like a simple ‘draw something to the screen’ function so it’s something that has been heavily used in legacy code.)
The problem is that in order to make it work some more extensive rewrite may be in order if you are dealing with legacy code from another generation. And it’s particularly that extensive rewrite that corporate programmers often won’t be able to take.

Yes, unless that ‘little more work’ you are talking about is being considered too much by management, than all your therories fall flat on their face with a loud ‘thump’.

Client side vertex arrays - just like static vertex buffers are nice when you can easily collect larger amounts of data. But they become close to useless if your primitives regularly consist of less than 10 vertices and on top of that are dynamically created and contain frequent state changes that break a primitive. Sure, you can continue to collect them, but you also got to collect your state along and in the end save no time vs. glBegin/glEnd. The world doesn’t entirely consist of 100+ vertex triangle strips.

No speculation. You seem to operate from the assumption that once the data is in the buffer it will stay there. Yes, in that case buffers are clearly the way to go.
But believe it or not, there are usage scenarios where it’s far more important to optimize the way of the data into the buffer than anything else. For a strictly CPU-bottlenecked app it doesn’t matter one bit how much data you can draw with a single draw call, all that matters is to find the fastest way to get your data onto the GPU - and that’s exactly my problem. Restructuring the code to allow better batching would cause maintenance overhead that’s entirely on the CPU, where we are already at the limit and each small addition can be felt immediately.

TL;DR, I know, to make it easier to digest I’ll post the summary separately.

Nikki_k · June 11, 2014, 4:38am

Let’s get back to the discussion about deprecation and its advantages and disadvantages. I firmly stand on the point that if something gets removed but at the same time has to be reinstated through a backdoor, there’s something gone horrendously wrong. If I want to deprecate stuff, I’d want to remove it eventually - permanently!
And to allow that you have to think twice about what features are in use, how they can be replaced and how much work needs to be invested to replace them.

But look at what happened: Stuff got deprecated. Fine!
But wait: There’s tons of legacy apps that may want to use the new features - so let’s add an extension that brings back all of the old.

Ugh…

Now, if things had been done seriously, at this point everyone should have stopped, think about this for a moment - and then develop a way to actually remove the old stuff WITHOUT bringing it back through the backdoor! The moment the ARB_Compatibility extension was established, the whold thing could have been considered a failure.

So it should be clear that the main reasons someone thought they NEEDED such an extension should have been addressed before actually removing anything.

malexander · June 11, 2014, 11:32am

Yes, we want our code to work on MacOSX and Intel hardware but what the theoreticians completely overlook is…

“Theoreticians” - this is a pretty bold assumption. The members of the OpenGL architecture review board consists of people who design graphics hardware, write graphics drivers, author graphics engines, and use the OpenGL API in applications.

…management also has a say in the matter, resulting in the following:

no rewrite from the ground up

no change of general program flow

no time consuming changes

In that case, you’ve got a pretty clear set of restrictions which would preclude you from doing much GL3 or GL4 specific work anyway, even with compatibility mode. While I’m sure they have their reasons, it seems a bit short-sighted to me. Dealing with third-party API changes is a normal part of software development, which software managers need to account for (usually as “software maintainance”). Legacy to Core just happens to be a larger change than most, and one which isn’t even being forced upon you.

However, under those development restrictions I don’t see any problem with setting out the hardware and platform requirements in your application’s system requirements (Windows - Nvidia or AMD).

You cannot pull away the rug under some existing software in the vain hope that everyone can afford to take the time to reorganize all the data.

Except that they didn’t. Compatibility mode remains for those cases, for GL implementors that are willing to support it. Even Apple has a GL2.1 compatibiltiy profile - you just can’t use GL4 features with it, which sounds like your application couldn’t use anyway.

See above: The inability to just put some data into a buffer without some insane driver overhead. Yes, just an efficient method to replace immediate mode draw calls. You may ignore this problem as much as you like, that doesn’t change anything about the cold hard fact that our ‘big app’'s life depends on it.

Sounds like your app suffers from the “small batch problem” which a lot of people in the industry are attempting to resolve (AMD with Mantle, Microsoft with DirectX 12, Nvidia with their 337 series GL driver). Nvidia does well in immediate mode because it has an excellent emulation layer which is batching up all the vertices for you into buffers. They also optimize their display lists during compilation. However, you mileage with vary, as some immediate mode and display list implementations are quite a bit slower.

Current hardware simply doesn’t like small draw batches with rendering changes (GL state changes) between them, so it’s up to the software to optimize the draws and buffer submissions. You can either do it yourself or hope the driver does a good job. The draw batcher we wrote provides very similar performance in Core GL mode to our previous immediate mode code. Perhaps a bit of profiling is required? Especially if, as you say, your big app’s life depends upon it.

Nikki_k · June 30, 2014, 8:16am

Coming back to this mess - because I think I found a solution to my problem - but it’s far from what I would ever have expected.

I have been trying around with all kinds of buffer hacks but to no avail: Uploading buffers with GL 3.x’s feature set is horrendously slow, no matter what API is being used, it only works for a relatively small amount of buffers per frame but not as in my case where I needed to do several 1000s of buffer uploads per frame. To solve this I would have had to cache the entire uniform state for 100s of draw calls just to reduce the amount of buffer uploads.

However, while working on something else, I noticed that uploading uniform arrays repeatedly is virtually free with no perceptible performance loss at all.
So, I had this crazy idea not to put my vertices into a buffer object but into a uniform array and merely use a static vertex buffer to index this uniform array - and to my neverending surprise, after a little tweaking it worked! On the systems I have tested this so far it’s nearly the same performance than using immediate mode functions - something none of the buffer-related methods even remotely managed. And the best thing: I do not have to mess around with caching state on the CPU to reduce the amount of API calls.

But now I ask myself: Why is buffer uploading so much slower than uniform uploading, to the point that in some extreme use cases it becomes completely worthless as a feature?

malexander · June 30, 2014, 10:42am

But now I ask myself: Why is buffer uploading so much slower than uniform uploading, to the point that in some extreme use cases it becomes completely worthless as a feature?

With the vertex buffers: are you just uploading buffer contents when drawing these small batches? Or are you also setting up the vertex state as well (glVertexAttribPointer)? I’d expect a buffer upload to be the same speed regardless of whether it’s filling a VBO or a UBO. Specifying the vertex state can be more expensive, though.

Also, some drivers are picky about the vertex formats used. AMD drivers, for example, don’t like vertex buffers non-4B aligned elements (such as 8b vec2). Try using GL_KHR_debug to see if you’re getting any performance warnings back from the driver.

Nikki_k · June 30, 2014, 11:02am

It’s just uploading data, no change of vertex state.

The buffer format is 5 floats per vertex: x, y, z, s, t.

I don’t use UBO, btw., just a plain, simple uniform array with 100 floats, enough to store 20 vertices and glUniform1fv to upload my data. This way I easily manage to upload 40000 batches with 200000 vertices per frame altogether with no performance degradation compared to using immediate mode.

Aleksandar · June 30, 2014, 11:06am

If you tell me whether your uniform values actually change between calls or not, I’ll be probably able to answer to your question.

First, there is a limited number of uniforms you can use. It is usually about 4K fp entries for older cards (or 16K for new top models). It is 16KB (to 48K). On the other side, VBO can be up to size of available memory (several GB). That’s the first difference.

Second, the drivers optimize setting uniforms. If they are not changed, nothing is sent to the graphics card. Try to modify 16KB of uniform space in each draw call. I bet it is more expensive than modifying 16KB VBO in a single glGetBufferSubData() call.

In any case, I’m glad you have found the solution for your problem!

Nikki_k · June 30, 2014, 1:57pm

Of course the uniform array changes! For each draw call it will contain the vertices that were generated.
It’s just, the performance of glUniform1fv is what I’d expect in this scenario. It’s roughly the same as transferring the same amount of data via immediate mode and somewhat slower than using a persistently mapped coherent buffer (as per GL_ARB_buffer_storage)

The buffer uploading must get hung up on some synchronization issue, but I’ve been unable to find out why. Of all the buffer upload methods I tried, glBufferSubData was the fastest one, but it still increased frame processing time from 20ms to 80ms for my 40000 draw call test scenario.

system · October 19, 2021, 6:15pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.