Official feedback on OpenGL 3.2 thread

Are you sure? I already used that in GLSL 1.30. [/QUOTE]

All I meant in my original post was that in the GLSL 1.50 spec they’ve added the precision preclaration for the float type to the global scope of the fragment language. In previous versions of the GLSL spec only the int type was predeclared in the global scope of the fragment language. So in the GLSL 1.50 spec the precision predeclaration for the float type should have been highlighted in magenta.

GLSLangSpec.Full.1.30.08.withchanges.pdf (page 36):

The fragment language has the following predeclared globally scoped default precision statement:
precision mediump int;

GLSLangSpec.Full.1.40.05.pdf (page 37):

The fragment language has the following predeclared globally scoped default precision statement:
precision mediump int;

GLSLangSpec.1.50.09.withchanges.pdf (page 45):

The fragment language has the following predeclared globally scoped default precision statements:
precision mediump int;
precision highp float;

Which will cause glTexImage not to return untill the copying is done. Which is another thing that I want to avoid. And then there is glTexSubImage, which depending on how the driver works may stall waiting for a sync or create a copy of the data…

glBufferData(…, NULL) also has a cost - it uses two buffers on the card where in some cases it could use only one (wait for the buffer to be no longer needed and then - DMA copy the data and signal the related sync object ). When you “stream hundreds of megabytes per second” - this may have an impact.

I also use glBufferData(…, NULL) currently. Which is a pity as in many cases only a small part of the mesh has changed. I will probably separate the mesh in chunks, among other reasons - to avoid sending a million verts when only 1000 have changed. Will also test glSubBufferData vs changing parts of the mapped buffer vs updating the whole buffer. Then the app will use selectively whatever of the 3 approaches works better in the current case, but hey - is that not ugly. There has to be a better way to do this.

Which will cause glTexImage not to return untill the copying is done. Which is another thing that I want to avoid. And then there is glTexSubImage, which depending on how the driver works may stall waiting for a sync or create a copy of the data…

glBufferData(…, NULL) also has a cost - it uses two buffers on the GPU where in some cases it could use only one (wait for the buffer to be no longer needed and then - DMA copy the data and signal the related sync object ). When you “stream hundreds of megabytes per second” - this may have an impact.

I also use glBufferData(…, NULL) currently. Which is a pity as in many cases only a small part of the mesh has changed. I will probably separate the mesh in chunks, among other reasons - to avoid sending a million verts when only 1000 have changed. Will also test glSubBufferData vs changing parts of the mapped buffer vs updating the whole buffer. Then the app will use selectively whatever of the 3 approaches works better in the current case, but hey - is that not ugly. There has to be a better way to do this.[/QUOTE]

I recently coded a library for loading bit-mapped TTF fonts into openGL textures (and VBOs). Even loading a single Font showed a noticeable difference in loading times when using glMapBuffer vs glBufferSubData. Running on the latest nVidia drivers the latter is completely useless for small chunks of data, I’m unsure how well it scales up with larger data chunks.

It’s quite embarrassing when there’s better wrappers available for C# over C++! People seem reluctant to integrate openGL 3.1/3.2 into their wrappers and APIs, why is this? Are the changes that difficult for them make? (that’s not me being rude).

nVidia have already released openGL 3.2 drivers. Its more AMD/ATI that are the problem in pushing the post 3.0 spec

I feel we should move to another thread.

Not sure about glTexImage, I am using it only to “create” texture. BTW. At least in NV, this is almost no-op. It does almost nothing. The real hard work is done when I call glTexSubImage for the first time (even with PBO is use).

glTexSubImage with PBO is async for sure. It returns immediately (in less then 0.1 ms)

Yes, it has some cost. But this way you can trade memory for CPU clocks. The driver can make second buffer to avoid waiting for PBO until it is available.

I agree that OpenGL is going to the right direction. Especially the ARB_sync extension is nice.
I am pretty surprised by the way that ARB_geometry_shader4 is core from now because it’s a deprecated feature. I think it was put into core just because D3D supports it. I would rather go into the direction of the tesselation engine provided by AMD/ATI since HD2000 series cards. That is a much more flexible functionality and it’s already or will be soon supported by D3D. The same things can be done with it as with geometry shaders and even much more.
This geometry shader thing is only present because at the time HD2000 came out, NVIDIA’s G8x cards weren’t able to do such thing.

P.S.: This buffer object performance related discussion gone out of control by the way so you should better continue it in a much more appropriate place :slight_smile:

Have you read GLSLangSpec.1.40.07 (May 1st 2009)?

GLSLangSpec.1.40.07 (Pg.36)

The fragment language has the following predeclared globally scoped default precision statements:
precision mediump int;
precision highp float;

And I don’t know why it is important at all? Because…

GLSLangSpec.1.40.07 (pg.35) / GLSLangSpec.1.50.09 (pg.44)

4.5 Precision and Precision Qualifiers

Precision qualifiers are added for code portability with OpenGL ES, not for functionality. They have the
same syntax as in OpenGL ES, as described below, but they have no semantic meaning, which includes no
effect on the precision used to store or operate on variables
.

Only Catalyst drivers require precision qualifiers in fragmen shaders. But, maybe even that will be changed when OpenGL 3.1 support comes.

nVidia have already released openGL 3.2 drivers.

Beta drivers don’t count.

I am glad that NV_clamp_depth finally got into core (and an ARB back port extension too!) Also we have geometry shaders too.

Too bad that GL_EXT_separate_shader_objects did not in some form make it to core (and there is a bit that is kind of troubling, in it one uses gl_TexCoord[] to write to, but GL3.2 says that is deprecated, so to use GL_EXT_separate_shader_objects does one need to make a compatible context then?)

Also a little perverse is that context creation now has another parameter, so now we have:

forward compatible GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB for the attribute GLX_CONTEXT_FLAGS_ARB
and
GLX_CONTEXT_CORE_PROFILE_BIT_ARB/GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB for the attribute GLX_CONTEXT_PROFILE_MASK_ARB
(and similarly under windows).

I also have the fear that for each version of GL, the description of GLX_ARB_create_context/GLX_ARB_create_context_profile will grow, i.e. a case for each GL version >= 3.0 (shudders)

So now I dream of:

  1. direct state access for buffer objects, i.e. killing off bind to edit buffer objects.
  2. decoupling of filter and texture data and a direct state access API to go with it.
  3. nVidia’s bindless API: GL_NV_shader_buffer_load and GL_NV_vertex_buffer_unified_memory

nVidia’s bindless graphics is pretty sweet IMO.

Well, we kinda had to introduce an enabling mechanism to select the profile, once we introduced profiles. If you don’t specify a profile or version or context flags in the attribute list, then something (hopefully) reasonable still happens: you get the core profile of the latest version supported by the driver. There is some redundancy in that a forward-compatible compatibility profile is exactly equivalent to a compatibility profile, since nothing is deprecated from the compatibility profile, but I think the attributes all make sense.

We don’t have any more profiles under discussion, if that’s a concern. Conceivably OpenGL ES and OpenGL could eventually merge back together through the profile mechanism, but that’s a long way into the future if it happens at all.

To speak somewhat in our own defense, the GL and GLSL specs are huge, complex documents that go through many, many revisions during the process of creating a new version. John and I try to keep the change markings sensible but it’s not going to be a 100% error free process; they’re mostly just a guideline to finding changed stuff, to accompany the new feature summaries. We don’t mark the places where a lot of text is removed with strikethroughs, for example. Our focus is on getting the new spec out on schedule, so some stuff like this is always going to fall through the cracks.

I see the use/reason of having forward compatible and profile selection (i.e. core vs compatibility) separate as they address 2 different issues, but those issues over lap (at this point a great, great deal)… and can you imagine the mess that occurs when trying to teach this stuff to people new to GL? They will scream that one is splitting hairs.

But I am really surprised that a bindless edit buffer object, I figured that it would have been a kind of no brainer to specify and for driver writers to deal with, something simple as for each buffer object type function glSomeBufferFunction, a new function glSomeBufferFunctionNamed which had one extra parameter, a GLuint of the buffer object; ditto for texture stuff too.

One point of irony for me right now is the deprecating (or saying only in compatible profiles) is the naming of interpolators in shaders, I prefer, in the language of GL_EXT_separate_shader_objects, “rendezvous by resource” instead of the current “rendezvous by name”. The argument of removing an interpolator at link time to optimize I have always found to be kind of thin (especially since for each tuple (vertex,fragment) or (vertex,geometry,fragment) shader one had to have a different program)… does anyone have some reasonable real world examples where one leaves an interpolator present if it would get optimized out?

Just out of curiosity, is there active movement on:

  1. direct state access
  2. separation of texture data from filtering method
    ?

I am starting to think, given that 2) is quite hairy to do well, we won’t see anything like that till GL 4.0…

Also, I am really, really overjoyed that the GL API is now getting updated more regularly, is that pace expected to remain? (I suspect that the pace of so many updates will slow once GL core exposes all features that D3D of that time exposes, and then once there only big updates at new GFX card generations and minor updates handling making things “cleaner”)

They overlap but the use cases are very different. The FC context is really only intended as a futureproofing aid for developers, along the lines of the (otherwise so-far mythical) “debug context”. Personally I wouldn’t even tell someone new to GL about it for quite a while.

Separation of filters and samplers is on the agenda for a future release, yes. DSA has been brought up as well, though it is perhaps less far along in terms of being seen as desirable by everyone.

You can’t avoid the copy with glBufferData. The data MUST be stored in page-locked system memory (to make sure your operating system won’t move it) before it can be asynchronously transfered to the GPU. There is only one way to access this memory directly and so to avoid the copy - glMapBuffer{Range}. The same applies for textures: use PBO. The buffer object represents both a GPU buffer and a corresponding buffer in page-locked system memory.

If I understand correctly, you want to have a fine-grained control over PCIe transfers. In CUDA it’s easy, you can use cudaMemcpyAsync and place a fence (called “event” in CUDA) after that, and then ask whether all preceding commands have ended. Because you have no such control in OpenGL, it’s hard to give advice. ARB_copy_buffer might help here but I am not sure. As someone already said, “only driver guys can tell”.

… Personally I wouldn’t even tell someone new to GL about it for quite a while.

That is my point, someone new to GL would say “What the?” But it is quite awkward if you look at what effective practice was:

GL3.0: request forward compatible context to make sure one was not using naughty deprecated stuff, mostly fixed function pipeline.

GL3.1: ditto, but if you did not request a forward compatible context then expect ARB_compatibility to be present, chances are then code to check if it there; often the context creation code is buried in some platform dependent monster that nobody likes to read, worse for cross platform code.

GL3.2: now be aware of difference between deprecated features and compatibility features, in theory a twisted one could request a forward compatible context and request a compatible profile (shudders).

hopefully, no more new context creation attributes will be added.

On a related note to my post (but not to GL 3.2) reading GL_EXT_separate_shader_objects, I found something that I thought was just plain bad

  1. Can you use glBindFragDataLocation to direct varying output
    variables from a fragment shader program created by
    glCreateShaderProgramEXT to specific color buffers?
    UNRESOLVED:
    Tenative resolution:  NO for much the same reason you can't do
    this with attributes as described in issue 15.  But you could
    create the program with the standard GLSL creation process where
    you attach your own shaders and relink.
    For fragment shader programs created with
    glCreateShaderProgramEXT, there is already the gl_FragData[]
    builtin to output to numbered color buffers.  For integer
    framebuffers, we would need to add:
        varying out ivec4 gl_IntFragData[];
    User-defined output fragment shader varyings can still be used
    as long as the application is happy with the linker-assigned
    locations.

my thoughts on that are ick… since if one is doing MRT, then you would possibly need to call glDrawBuffers on every shader switheroo… ick… would be better if they just added a pragma like interface to fragment shaders:


pragma(out vec4 toFrag0, 0)
pragma(out ivec4 toFrag1, 1)

or extend the layout() deal in fragment shaders:


layout(fragment_output, 0) out vec4 toFrag0;
layout(fragment_output, 1) out ivec4 toFrag1;

and along those lines one could then use that kind of mentality on interpolators, i.e in vertex shaders:


layout(vertex_output, 0) out vec4 myValue;
layout(vertex_output, 1) out flat myFlatValue;

and in geometry shaders:


layout(geometry_input, 0) in vec4 myValue;
layout(geometry_input, 1) in vec4 myFlatValue;

layout(geometry_output, 0) out vec4 myValueForFragging;

and then even in fragment shaders:


layout(fragment_input, 0) in vec4 myValueForFragging;

but on closer inspection since out/in qualifier is already there, it can all be collapsed to:


layout(location, N) in/out [flat, centroid, etc] type variable_name;

The sweet part of this being that one can then dictates where attributes and interpolators are, the lazy could even skip calling glAttrbuteLocation…

This comment probably would be best in suggestion for next release or some kind of GL_EXT_separate_shader_objects thread.

You can do that, but as previously noted, nothing is deprecated from the compatibility profile, so a forward-compatible compatibility profile is exactly the same thing as a non-FC CP :slight_smile:

I would be a little surprised if any of the remaining deprecated-but-not-removed features in the core profile are actually removed from core anytime in the near future. It’s challenging enough dealing with the number of options we have today.

I don’t like this, either. The previous issue, about vertex input attribute binding locations, has the same problem. I’m used to binding my vertex attributes to known locations when I load a shader, to avoid both excessive vertex array rebinding and to remove the need to query and store the locations of every attribute, for every shader…

A big sloppy kiss for the quick ref cards! :slight_smile:

I don’t like this, either.

Which is why this is an nVidia extension (despite the EXT name) and not the eventual solution to this problem. It tries to solve the problem by making OpenGL pretend to do things the D3D way, but without realizing that OpenGL has issues with doing things that way.

By the way, does anyone know if glu.h is ok with the gl3.h header file? I know some of glu is no longer relevant to openGl 3.2, is it compatible at all with a forward context?

Many thanks