Official feedback on OpenGL 4.0 thread

The slides from the GDC session OpenGL 4.0 are now available on the Khronos site.

Be optimistic, sampler object is the first DSA API!
That’s a good sign for OpenGL 4.1
:o)

I had to design a renderer for a 10 years old software full of glBegin(*); glPushAttrib(ALL); reset(everything); selector; OpenGL context switchs when lost; etc…

Most of this was the result of programmers working with OpenGL without much of an understanding of it as OpenGL was not critical or a limitation until then. I’m not saying they were bad programmers, the WORSE OpenGL code I saw, has been written by the BEST programmer I worked with: A serious C++ god. The code was an absolute piece of art as a C++ code but an absolute piece of trash as an OpenGL code. He used OpenGL not as a main tool and didn’t thought it could be much of a cost:“After all OpenGL is a standard, it should be good if people agreed?”.

That’s why I disagree with nVidia that’s state that the deprecation mechanisium is useless. I’m not saying I don’t want to see the deprecated functions anymore, I’m saying that I feel more confident to go the see developers and say: “Here is OpenGL core, basically, you can rely on that to be efficient. On other hand, there are more features with OpenGL compatibility but…”. A simple guideline than anymore can understand.

With DSA, it’s the same level of idea. DSA is a concept that any good programmer will understand and be able to work on a “good” software design.

Also, I think that the spec major version by hardware level allow to simplify a lot the support of a large amount of platforms. It’s quite easy to understand to anyone not just heavy OpenGL fan seeking for extensions all the time.


“A great solution is a simple solution. DSA for OpenGL 4.1.”
(My campaign slogan for DSA! :p)

Hi Rob,

Ah…MT driver + library that must reset state = terrible performance, I can see that.

For item (b), background loaders, is this a problem if the background loader is on a shared context on its own thread? I would [hope] that the background loader isn’t inducing a reconfiguration of the pipeline within the rendering context.

Of course, I am assuming that my limited set of GL calls to do background loading isn’t using the GPU at all, otherwise there is a hidden cost to async loading. That is, I am assuming that a glTexImage2D in a “loader context” without a real drawing of triangles won’t cause the driver to go through and actually update the pipeline configuration. This is a completely unvalidated assumption. :slight_smile:

cheers
Ben

OpenGL calls are collectionned in a command lists and can theorically stack up to 1 frame latency. That’s a usual strategy used for conditional rendering with occlusion query so that the CPU never stall waiting for the GPU to process something.

Queries and glGet are likely to imply a stall, not glTexImage2D.
That’s an other why DSA saves us! We don’t have to use the save (which imply a glGet) / restore paradigm anymore.

There are several discussion over this forum about why we love DSA, I’d like to hear why some people don’t?

A code sample that shows the problem:


GLuint Texture = 0;
// Some code
glBindTexture(GL_TEXTURE_2D, Texture);
// Some code
GLint ActiveTexture = 0;
glGetIntegerv(GL_TEXTURE_BINDING_2D, &ActiveTexture);

The full command list need to be processed to be sure that Texture == ActiveTexture. Maybe in “some code” another texture as been binding.

The GLSL 3.3 spec (and GLSL 4.0) say:

Added Appendix A to describe include tree and path semantics/syntax for both the language and the API specifications.

But the document does not include appendix A.

Thanks,
Patrick

If DSA is all about capturing/restoring state, why not providing “state block objects”?

Something like:


GLint stateblock=0;
glGenStateBlocks(1, &stateblock);
...
...
glCaptureState(GL_COLOR_BUFFER_BIT|GL_FRAMEBUFFER_BIT, stateblock);
...
glBindFramebuffer(GL_FRAMEBUFFER, myFBO);
glColorMask(1,1,1,1);
glClearColor(0,0,0,0);
glClear();
...

glRestoreState(stateblock);

This lets you interact nicely with “external” code without ever needing to call glGetXXX(). Additionally, stateblocks may be a very quick way to set complex state really fast, just like Display Lists once did:


GLint savestate=0, mystate=0;
glGenStateBlocks(1, &savestate);
glGenStateBlocks(1, &mystate);

glBindFramebuffer(GL_FRAMEBUFFER, myFBO);
glColorMask(1,1,1,1);
glClearColor(0,0,0,0);
glCaptureState(GL_COLOR_BUFFER_BIT|GL_FRAMEBUFFER_BIT, mystate);
...

//then the actual execution might look this way
glCaptureState(savestate);
glRestoreState(mystate)
glClear();
glRestoreState(savestate);

@Shynet: It looks like glPushAttrib … I’m not sure.

I submit few times a state object but more like to replace display lists. I sometime use display list as state object wish is more efficient than not using them. For that I design some “macro objects”. When I “bind” those, I call the list.

It’s possible that my design is efficient only because I work at macro object level which probably imply more calls that I could do be still: Nice and handy!

The thing is just that display list are deprecated and it would be nice to replace them!

Actually sampler object are king a step forward on that way.
(Like VAO? ouchhh)

Something is bothering me: GL_ARB_gpu_shader_fp64 is part of OpenGL 4.0 specification… That’s mean that my Radeon 5770 is not an OpenGL 4.0 card…

I went through the spec to check if there is a rule to relax double support… but it doesn’t seams to be the case.

Considering that double are not that useful yet, I’m quite sceptical about this choice. It might slow down OpenGL 4.0 adoption and make quite some developers using OpenGL 3.x + a large bunch of extensions.

The good thing with OpenGL major version per hardware generation is:

  • 1 code path for OpenGL 2.1 for GeForce 6 >, Radeon X >
  • 1 code path for OpenGL 3.x for GeForce 8 >, Radeon HD >
  • 1 code path for OpenGL 4.x for GeForce GT* 4** >, Radeon 58** >

Because of this we end up with an extra code path. OpenGL 4.WEAK which will be OpenGL 3.x + load of extensions for GeForce GT* 470 < and Radeon 57** <=.

Not cool.

If the idea was to make OpenGL 4.0 for high-end graphics only, maybe a high-end profile for Quattro and FireGL and high-end GeForce and Radeon would have been great. OpenCL have such option in the spec that I quite like.

Maybe the RHD57xx do support doubles, but in DX’s nature it’s not a mentionable/marketable feature?
(disclaimer: I haven’t checked any in-depth docs on 57xx)

Edit: http://www.geeks3d.com/20091014/radeon-hd-5770-has-no-double-precision-floating-point-support/

Maybe there’s a glGetIntegerv() call to check precision, just like gl_texture_multisample num-depth-samples in GL3.2 ?

The only query the extension define is with glGetActiveUniform …(and transform feedback but…) I don’t think it would be anything useful.

Be optimistic, sampler object is the first DSA API!

The second. Sync objects are the first.

Because of this we end up with an extra code path. OpenGL 4.WEAK which will be OpenGL 3.x + load of extensions for GeForce GT* 470 < and Radeon 57** <=.

Oh well. It could be worse. At least OpenGL allows you to get at that 3.x + extensions.

Also, please note: most of the 4.0 features are core extensions, so you don’t need a true codepath. Your code will call the same function pointers with the same enum values. All you need to do is check for version 4.0 or the extension.

I would not call it that way. Sync objects are something… “different” as it’s not a name but a structure. let’s say compared to the DSA extension.

[quote]Because of this we end up with an extra code path. OpenGL 4.WEAK which will be OpenGL 3.x + load of extensions for GeForce GT* 470 < and Radeon 57** <=.

Oh well. It could be worse. At least OpenGL allows you to get at that 3.x + extensions.

Also, please note: most of the 4.0 features are core extensions, so you don’t need a true codepath. Your code will call the same function pointers with the same enum values. All you need to do is check for version 4.0 or the extension. [/QUOTE]

True I guess (especially because I’m not seeing myself writing OpenGL 4.0 code path anytime soon!, OpenGL 4.WEAK rules!).

I would say that just like Rob said, this lot of OpenGL spec releases is clearer for everyone and a better deal than the “lot the extension specs”. “Lot of specs and lot of extension specs”, I’m not sure.

Groovounet - to be clear, I have nothing against DSA as a feature; since my app is monolithic it simply won’t benefit in the major way that library-based apps might. (By being monolithic, we can simply shadow everything - the win might be in total function calls but we wouldn’t be removing pipeline stalls.)

My first post is confusing because I completely misinterpreted the community’s meaning in “MT” rendering, which I guess is understood to mean a multi-threaded GL driver, split between the user thread and a back-end that collects and executes calls later, providing parallelism between app code (iterating the scene graph) and driver code (validating hardware state change, preparing batches, etc.).

The multi-threaded rendering I would like is different and not a function of DSA: I would like to be able to render all six sides of an environment cube map in parallel from six threads. :slight_smile:

cheers
ben

I would say that just like Rob said, this lot of OpenGL spec releases is clearer for everyone and a better deal than the “lot the extension specs”.

I wouldn’t. It’s much easier to look at an extension specification to figure out what it is doing than to see exactly how a specific feature like sampler_objects is defined in GL 3.3.

I would like to be able to render all six sides of an environment cube map in parallel from six threads.

You can already do that with geometry shaders and layered framebuffer objects.

Before asking for something, you might want to make sure you don’t already have it :wink:

Well, to be pedantic, that’s not exactly what he said.

Using Geometry shaders and layered fbo’s lets you render several things at once from ONE thread. He said he would like to render from several threads, like using the GPU in a true multi-tasking way. Actually that would be like having 6 contexts (though the GPU would still process everything in sequence, i guess).

Of course i assume that Alfonse’s suggestion actually solves the problem at hand. I don’t see how truly multi-threading the GPU should be of value to anyone.

@bsupnik: Correct, when people talk about “multi-threaded rendering” they are concerned with offloading CPU computations to different cores. Drivers do that by queuing commands and executing them in their own thread. Applications would like to do things like object-culling and preparing the commands for rendering them in a separate thread, such that the main-thread (the one that is most likely CPU-bound) is freed up.

I have an application where entity-culling can (depending on the view-direction) become a serious bottleneck. But since it is intertwined with rendering, it is not easy to offload it to another thread. If i could create a command-list, i could take the whole complex piece of code, put it into another thread and only synchronize at one spot, such that the main-thread only takes the command-list and executes it, with no further computations to be done.

I would really like to see something like D3D11’s command buffers in OpenGL 4.1. Though i assume for that to work well, the API state management needs to be thoroughly cleaned up. And that is the one thing that the ARB (or IHVs that don’t want to rewrite their drivers) has been running from so far. But well, maybe they come up with a good idea that does not require a major API change. With the latest changes they have proven to be quite clever about it.

Jan.

Not a request for a GL feature, but just on the spec itself. GL4 added tessellation, but it would have made the spec’s pipeline much easier to read and understand if a diagram was included of all the shader stages, preferably something nicer than the ascii-art diagram found in ARB_tessellation_shader to make it clear when and if what shaders are run.

Same story for the transform feed back and vertex streams.

Additionally, a little more spelling out on the instancing divisor functionality. I liked how the spec explained the instancing, using the non-existent functions “DrawArraysOneInstance” and “DrawElementsOneInstance”, may be go a step further to explicitly and clearly say the role of the divisor of VertexAttribDivisor.

Giggles, people still pining for DSA. For those saying it is just syntactic sugar, look at slide 65 of http://www.slideshare.net/Mark_Kilgard/opengl-32-and-more and see that it is more than just a convenience when one is not tracking binded GL state yourself.

The new blending stuff was quite unexpected, does anyone know if AMD/ATI’s 5xxx cards can actually do that? The new funky blending is also GL 3.3, can the GeForce 8 series do that too?
[note that in GL3.x the blending parameters cannot be set per render target though, only enabling and disabling can be set per render target]

All in all, I am very, very happy and very appreciative to the writers of the specs and the IHVs that support GL, that the GL API is being so actively maintained and updated (especially when compared to the dark days between GL2.x and GL3.0)

You made me looking at the specs at AMD’s website because I was convinced the entire Evergreen line could do double precision calculations… but double precision performance in Gflops is not mentioned for these cards, so perhaps they do not support it!

It’s just that only some of the HD 5xxx line supports double-precision, not all of them. So only some can offer GL 4.0 support.

I have been assured that AMD is going to support OpenGL 4.0 on the entire Radeon 5*** line… I asked for how… but I didn’t had any answer…

Smell bad, I don’t like that!

I would not be supprized to see AMD coming up with support for OpenGL 4.0 even by query to the drivers when it’s not actually the case! We had quite some histories like that from AMD, nVidia, Intel (Intel is King on that matter!) We will see with drivers release… within 6 months?

Other possibility is that AMD claims to support double but all the double uniforms and variables are actually performed at single precision… With the relax en implicit conversions it would be basically as simple to implement double than doing something like this:

define double float

I’ll rather test OpenGL 3.x + lot of extensions than this. However, on a marketing point of view 4.0 is bigger than 3.3 + almost all extensions.

So, say they all support GL 3.3, and they all support the GL4-class extensions that can work on those chips. Is that a usable combination for the apps that don’t need double precision, but do want to use tessellation for example?