Official feedback on OpenGL 4.0 thread

Question regarding OGL 4 and 3.3. I have a GTX 280, would that support OGL 4 features, or would I just use 3.3. I have heard reports of DX11 features running on a DX10 card, so that’s why I ask.

No OpenGL 4.0 for DirectX 10(.1) class hardware. All functions from OpenGL 4.0 that can run on DirectX 10(.1) hardware are available in OpenGL 3.3. Examples of functions that you cannot use with your hardware are the new tessellator control shaders, these are therefore only found in OpenGL 4.0 (and not in OpenGL 3.3).

And now I’d like to present the most awesome thing in the OpenGL 3.3 specification:

Oh, and FYI: glDrawElementsOneInstance also does not exist :wink: Are there any other functions in OpenGL that do not exist that the ARB would like to tell us about? :smiley:

It is funny, but that was my dream that the next release of the standard will include actually two versions: 3.3 for DX10(.1) hardware and 4.0 for DX11. It actually became real! Thanks Khronos!

Also thanks for GL_ARB_draw_indirect that will enable the writing of fully GPU accelerated game engines and for GL_ARB_shader_subroutine to enable modular shader development. Great work!

How are subroutines different from just having a switch statement calling different function depending on an integer uniform?

modularity

Are there likely to be 3.4, 3.5, … releases? For example, when 4.1 comes out, will 4.1 features not requiring GL 4 hardware be put in the a core 3.4 spec? I’m assuming (or at least hope) the answer is yes.

Long term, wouldn’t this get messy after several major GL releases?

Regards,
Patrick

They could just do it with core extensions. For example, if they have shader separation, they could just make an ARB_program_separate core extension rather than a point release.

They didn’t make GL 2.2 just so that 2.1 implementations could use VAOs; they just made a VAO extension.

I’ve advocated for having more core releases because it clarifies the work that each implementor must sign on for in order to keep pace. IMO in the past there were too few core releases and way too many vendor extensions, leading to a lot of developer issues. Looking at it in the present tense, it’s really no big deal if a modest number of extensions appear on top of 3.3, or if a 3.4 were to appear - neither one would result in a change of supported hardware, assuming your set of relevant vendors were to implement the completeness of either path.

Looked at another way, say if there are still some couple dozen features on DX10 level hardware that GL3.x has not yet exposed - (I don’t think there are, but just for discussion) - there would really be no harm done to have a 3.4 / 3.5 / 3.6 to address those issues over time, as long as you didn’t have to wait a couple of years to get there.

It’s been about two releases a year for the last two years, IMO this is a sensible cadence that should continue, in turn reinforcing developer confidence.

I guess I’m saying that timeliness and cross-vendor coherency exhibit more value to me than the distinction between core and extension. An example of this would be anisotropic filtering. It’s not in core due to some long standing IP conflict the details of which escape me. Doesn’t matter though, because most implementations have it.

I agree with Rob, that two core releases they have now are great. GL3 for DX10 hardware and GL4 for DX11 hardware. And 3.3-3.x anything that isn’t covered yet can still be added. Same goes for GL4.0-GL4.x when DX12 hardware comes out in a few years! :slight_smile:

IMHO, Instead of making multiple versions API (GL2, GL3, GL4, GLES, GLES2 and other), can use profiles - one API, many profiles. Like this:
GL_CONTEXT_GL2_HARDWARE_PROFILE_BIT_ARB - for DX9 hardware
GL_CONTEXT_GL3_HARDWARE_PROFILE_BIT_ARB - for DX10(.1) hardware
GL_CONTEXT_GL4_HARDWARE_PROFILE_BIT_ARB - for DX11 hardware
GL_CONTEXT_GLES_HARDWARE_PROFILE_BIT_ARB - for embedded systems
GL_CONTEXT_GLES2_HARDWARE_PROFILE_BIT_ARB - for embedded systems

For next release opengl api, some profiles may be deprecated, or supplemented other new features.
For example, OpenGL 4.1:
added ARB_texture_barrier functional in core for GL_CONTEXT_GL3_HARDWARE_PROFILE_BIT_ARB and GL_CONTEXT_GL4_HARDWARE_PROFILE_BIT_ARB profiles.

Or something like this. :slight_smile:

What does that matter? You can already specify a version number to glContextCreateAttribARB. Why would you need a special bit to say you want GL3 when you just asked for a GL 3.3 context?

One important point is, that with version numbers (instead of profiles) vendors will be able to simply write “supports OpenGL 4.0” on their products. That is very important for competition, because when some other vendor is still stuck at version 3.2, some people won’t buy their hardware, even though they might not know anything about OpenGL. Bigger number, more interest by customers (just like with DX, too).

That forces vendors to adapt OpenGL more quickly and that in turn is a good thing for developers.

Jan.

Hi, I don’t understand how with OpenGL 4.0 is possible to achieve MT rendering (or better submit n parallel commands) to the OpenGL driver without DSA api.
Could you please explain to me?
Does the OpenGL specification not only force monothread to issue commands but even isothread (it means that the only thread allowed to issue OpenGL commands within a process is the thread that creates the context, so an even more restrictive condition than simple monothread)?

If it’s possible, in which instances?
Because some years ago one of my OpenGL programs was loading a lot of images as textures and actually I was able to load them from jpg files with separate threads in memory but then, to load them as OpenGL textures, I had to use the thread that created the context otherwise I’d experience an unexpected behaviour.

Could you please help me better understand?

Cheers,

Ps. I use the term iso (thread) because it’s latin to mean “the same and only”.

Hi Y’all,

I agree with Rob mostly - it’s certainly very nice to have DX11 type functionality in ARB specs slated for core now, so early on. With DX10 we had NV vendor specs for a long time, and it wasn’t clear how many would make it to core. For an apps developer this introduces uncertainty in the road map.

Re: DSA and MT rendering, it is not at all clear to me what MT rendering case would be solved. This is my understanding of the situation:

  • As of now, you can render to different render targets with different threads using multiple shared contexts. If you use “state shadowing” to avoid unnecessary binds/state selector changes, you’ll need to use thread local storage for the shadowing. One context, one thread, one FBO, off we go.

  • As of now, this will not actually cause parallel rasterization on the card because modern GPUs (1) have only one command interpreter to receive and dispatch instructions and (2) allocate all of their shaders in one big swarm to work through that rasterization. If anyone knows of a GPU that has gone past this idiom, please shout at me.

  • If you do am MT render from multiple threads/shared contexts, what will really happen is the driver will build up multiple command buffers (one for each context) and the GPU will round-robin execute them, with a context switch in between…pretty much the same way a single core machine executes multiple apps. I do not know what the context switch cost or granularity is.

  • Because drivers spend at least some time building up command buffers, a CPU bound app that spends a lot of time in the driver filling command buffers might in theory see some benefit from being able to fill command buffers faster from multiple threads. I have not tried this yet, and I do not know if driver sync overhead or the cost of on-GPU context switches will destroy this “advantage.” It is my understanding that a GPU context switch isn’t quite the performance nightmare it was 10 years ago.

If the goal is MT rendering to a single render target/FBO, I don’t know how that will ever work, as serialized in order writes to the frame buffer is pretty close to the heart of OpenGL. I don’t think DSA would address that either.

As a final side note, I’m not as excited about DSA as everyone else seems to be. (Of course, X-Plane doesn’t use any GL middle-ware so we can shadow state selectors and go home happy. :slight_smile:

In particular, every time I’ve taken a careful look at what state change actually does, I see the same thing: state change hurts like hell. Touching state just makes the driver do a ton of work. So I’m not that interested in making state change any easier…rather I am interested in finding ways to avoid changing state at all (or changing the ratio of content emitted to state change).

In other words, what’s annoying about selecting a VBO, then setting the vertex pointer is not the multiple calls to set a VBO/vertex pointer pair…it’s the fact that the driver goes through hell every time you change the vertex pointer. Extensions like VBO index base offsets (which allow for better windowing on a single VBO) address the heart of the problem.

Cheers
Ben

I agree with the mentioned benefits of .x versions over just extensions. Besides guaranteeing vendor support and marketing reasons, it is also nice for developers to be able to create an x.y context and know what features to expect.

Perhaps by the time GL5 comes out, people will be saying DX12 for GL5 hardware. :slight_smile:

Regards,
Patrick

to load them as OpenGL textures, I had to use the thread that created the context otherwise I’d experience an unexpected behaviour.

You want to use the PBO extension :
http://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt
See also :
http://www.songho.ca/opengl/gl_pbo.html

In gl.spec, Indexub and Indexubv both have a category of VERSION_1_1_DEPRECATED but neither have a deprecated property as the others with that category do.

I think the value of DSA when used on an MT driver has to do with these issues:

a) MT drivers don’t like it when you query and may inflict big delays on your thread.

b) Currently, you have to perturb the drawing state (via binds) to do things to objects, like TexImage. Background texture loaders may run into things like this.

c) So if you want to do things to objects without perturbing the drawing state, you need to save and restore bind points.

d) if you are not in a position where you have done your own shadowing, so you know what the bind point was “supposed to be set back to” for drawing, then you have to query. This can be particularly difficult for things like in-game overlays that are implemented as a separate library without knowing anything about the code base they are co-resident with.

DSA allows for directly communicating changes to objects without altering the drawing state, and not having to query/reset bind points that affect the drawing state.

will the slides from the OpenGL session from GDC be available on the Khronos site?