Version 3.x

Groovounet: squeaky wheels have more chance of getting oiled, so I’m squeaking.

But yes, Rob, your replies are very much appreciated :slight_smile:

Second that!

By object synchronization we can sequence and parrallalize other operation of GL commands. For instance we can circulate through a vertex array without having to worry about intervene with in-progress drawing accessing the same region of the array.

I would love to see GLSL to ARB program glGet function where the developer can debug or find performance hits.

Third that.

He deserves to be paid reverence to, as the Crusader of the Pixels, Puffers and People. :slight_smile:

This one will not be in 3.1, but has been an active discussion topic amongst the working group. If you could elaborate on some use-cases you are encountering where you might predict a benefit via the ability to cache and reload a post-linked binary form of a shader, that could be helpful. (This is a feature I would certainly advocate for and we have some specific workloads in mind, but would like to hear other POV’s on it)

Concretely, if you look at the NV or Apple fence extensions, are these in the ballpark for the capability you need ?

There is at least one type of usage under which you can safely overlap CPU and GPU access to a common vertex buffer, which is to adopt a write-once policy with a strictly ascending cursor.

If you use MapBufferRange you can eliminate the blocking delay when the CPU wants to write some new data, while the GPU/driver may still be fetching previously written data.

When you get to the end of the buffer, orphan it (BufferData with a NULL source pointer, keep size the same) and wind the cursor back to offset zero. If you do this properly, there won’t be any hazard.

This won’t be useful for more fine grained schemes where you may be performing multiple updates on a buffer repeatedly, and you want to complete pending draws on those regions before touching them again - for that you would need a true GPU progress indicator such as a fence.

Have you all gone to Bob’s profile and voted him 5 stars yet?

They dont work across shared contexts which can be a problem in multithreaded programs where you really need them.
Whatever happened to:
http://oss.sgi.com/projects/ogl-sample/feedback/sync.spec.txt
This was written way back in April 2006 and extended fences across shared contexts.

The most important synchronization is VSync, putting a fence after swapbuffers can tell us that the previous frame is complete, but it doesnt tell us if the VSync occured BEFORE the buffers were swapped.
I need a way to detect dropped frames, perhaps a count of how many VSync’s occured between the last two buffer swaps.

Simon that is really good feedback on sync facilities, much appreciated.

To repeat it here.

http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=254690#Post254690

Are there plans to have the entry function of a shader be a user specifiable parameter? Having this would allow it to layer an effect system lite on top of glsl and make sharing of common code easier.

Cannot this currently be done with GL3?

When using glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT one needs to know if the GPU is done with the frame which last used that region of the buffer. If one places a dummy occlusion query test at the end of the frame couldn’t one asynchronously query for completion to know when that frame was completed and thus when it is safe to do an unsynchronized map buffer range?

Sure it isn’t the intent of occlusion queries, but it should work in theory.

From the glsl spec:

#line must have, after macro substitution, one of the following forms:
#line line
#line line source-string-number
where line and source-string-number are constant integer expressions. After processing this directive

(including its new-line), the implementation will behave as if it is compiling at line number line+1 and
source string number source-string-number. Subsequent source strings will be numbered sequentially,
until another #line directive overrides that numbering.

It would be nice to relax the restriciton that source-string-number is a constant integer. This would make it easy to manually add #line 42 its_a_shader.glsl to each shader line to improve the quality of the error log, especially in the presence of many shaders. Implementation defined line numbers would still be integers though. Alternatively, it might be possible to have something like this.

glShaderParametersv(shader_id, GL_SHADER_NAME,“its_a_shader.glsl”);

Cg supports this.

Similarly, changing the entry point of a shader could be done with

glShaderParameter(shader_id, GL_SHADER_MAIN,“not_main”);

(I could not edit the post above)

Cannot this currently be done with GL3?

When using glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT one needs to know if the GPU is done with the frame which last used that region of the buffer. If one places a dummy occlusion query test at the end of the frame couldn’t one asynchronously query for completion to know when that frame was completed and thus when it is safe to do an unsynchronized map buffer range?

Sure it isn’t the intent of occlusion queries, but it should work in theory. [/QUOTE]

It’s an interesting possibility, I hadn’t considered abusing occlusion queries in that fashion. But more generally speaking we need to get real sync capability into GL core.

There is definitely desire to do this. Can’t really speak to scheduling on it yet.

This one will not be in 3.1, but has been an active discussion topic amongst the working group. If you could elaborate on some use-cases you are encountering where you might predict a benefit via the ability to cache and reload a post-linked binary form of a shader, that could be helpful.[/QUOTE]

I’ll volunteer one.

We have a hard 60Hz requirement (military flight sims). There can be no frame breakage in our app. Nevertheless, sometimes there’s a need to change an app setting which would mandate different logic in many/all of the shaders we’re rendering with. This is the usual thing: fog, number of lights, light types, rendering mode, new material we’ve not seen before just got loaded, etc.

Real-time shader compilation with GLSL is a total non-starter because it’s in the driver. You can’t even off-load it to a background core or stage the compiles over multiple frames to avoid breaking frame.

However, we can spend as much time pre-computing as we want. We could surf the whole database (huge; 100GB+ huge), precompile all the shaders for the target hardware at DB-build time, and then just load them straight off disk into the hardware at run-time and not break any frames … if precompiled shaders were possible in stock OpenGL.

Note that the other options are not so pretty. We could use a bunch of shader "if"s and loops instead of using shader permutations, to the detriment of performance and absence of support for older cards – not really practical.

OR we could switch away from GLSL to Cg, because with Cg the compile happens more intelligently above the GL API layer in parallizable user-space CPU code, which can be tossed on a background core/CPU, or even onto a completely different machine in the “rendering node cluster” connected via ethernet. In the absence of GLSL precompiled shader support, this is what we’ll have to do. Despite the Cg global run-time lock (which will make implementing this more of a pain than it needs to be), this still seems like the only workable option.

My app has 2 use cases:

  1. Scene editor. The user can create many shader combinations, and linking the shaders every time the app starts can take a long time. Being able to load compiled shaders directly from disk would simply save the user time and frustration.

  2. Playout. This has a hard 60 Hz frame rate requirement (television). Scenes are loaded and displayed in sequence. The next scene in the sequence is loaded and shaders are compiled in a low priority thread. However, (on NVidia hardware at least) calling glLinkProgram pauses all OpenGL contexts in all threads in the process for the duration of the link, causing playout to stutter.
    Ideally that stutter just wouldn’t happen, but being able to save shader binaries would avoid this stutter in 2 ways:
    (a) if we’ve linked the shader before on the machine we can just fetch it from disk
    (b) if the shader needs linking, we can do the link in a separate process (which doesn’t affect the OpenGL contexts in the main process) and copy the program binary to the main process.

Related to this, NVidia defers some shader compile/link/optimization overhead until the first draw call that uses a shader (possibly uniform setup related), causing a stutter of sometimes over 120ms per first draw call <u>per shader</u>. That’s over 7 60Hz frames per shader!!! We last saw this big-time on a GeForce 7950GT. One frame drop is too many (much less 7), but we were seeing many, many more all-at-once, where the driver went out-to-lunch for 0.5-2.0 seconds recompiling/reoptimizing shaders inside the draw calls. That’s nuts.

Part of my hope with precompiled shaders is that all of this compile/link/optimize overhead would be done before the final shader “binary” is cooked and queried, so as to extricate all this time-consuming vendor-specific shader “prep work” from what is supposed to be a real-time render loop.