Hello,
I have read interesing article about new geometry shaders in OpenGL at http://appsrv.cse.cuhk.edu.hk/~ymxie/Geometry_Shader/
and I found two interesting ideas.
It seems that vertex shaders are just special case of geometry shaders that have input and output primitivies set to points, and can emit one primitivie per invocation.
So why we still need them, should’t they be considered in compatibility layer?
The answer may be: it is handy to set up the scene first and then generate more geometry details.
OK, then what about shading layers, where you can specify a stack of shaders that are executed in order as successive passes?
There is my conception:
geometry fragmet
shading shading
stack stack
+---+---+---+ +---+---+---+
+----------+ | | | | +---------------+ | | | | +----+
| geometry |-->| 1 | 2 |...|-->| rasterization |-->| 1 | 2 |...|-->...-->| FB |
+----------+ | | | | +---------------+ | | | | +----+
+---+---+---+ +---+---+---+
\_____ _____/ \_____ _____/
V V
GL_MAX_GEOMETRY_SHADING_LAYERS GL_MAX_FRAGMENT_SHADING_LAYERS
A driver can handle GL_MAX_GEOMETRY_SHADING_LAYERS and GL_MAX_FRAGMENT_SHADING_LAYERS per stack. glUseProgramObject() must be extednded to support layers, and every program object must be attached to one layer (similar to FBO color attachments).
If a layer has no attached program it will be ommited during execution. If there is no attached program to any layer OpenGL pipeline will use fixed functionality.
Program objects cannot mix geometry and fragment shaders (restriction), because every layer in stack operates on the same type of data (geometry primitivies or fragments) and must procuce it for next layer, thus it is handy to implement shader stacks as objects (see below).
Execution of programs in stack must be performed in order form first to last. Each program in lower layer must be executed completely before higher layer may be computed. So there need to be at least two buffers (input and output) for temporary computations between programs. The result of last executed program in stack is used in successive operations of OpenGL pipeline.
Stacks of shaders could be also implemented as objects, so instead of binding shader programs to fixed stacks by glUseProgramObject() we may create a stack object, attach programs to it, and use by glUseGeometryStack(stack_handler) and glUseFragmentStack(stack_handler) like we use glUseProgramObject() today (stack_handler==0 means fixed functionality).
In addition, there could be also possibility to streamout data from stack directly to VRAM like in DirectX10 by setting streamout attribute in particular stack objects (that attribute could be directly a handler of vertex buffer for geometry shading stack or texture for fragment shading stack - if it is set to 0 data will be propagated to next stage in graphic pipeline).
The advantage here is that data are transmited after all stack operations w/o any API involvement that must be performed every iteration between shaders in DirectX10.
This schema has five big advantages:
-
single rendering pass gives more abilities to compute output rendering
-
reduces number of additional passes that need to be performed, so it reduces API overhead (and it is one of new OpenGL LM goals)
-
simplifies OpenGL interface: there is no more need for separate vertex shader functionality (and it is another goal of OpenGL LM)
-
since we operate on full geometry shaders from the begining we can operate not only on defined points (like in vertex shaders) but also on full primitivies during geometry setup
-
possibility to compute different/unrelated operations parallely in future if stack of shaders are objects, and there will be support for multipath rendering with synchonization (i.e. when there are 2 or more graphics cards or parallelism of GPUs will be exposed to software).
Use cases in geometry shaders are pretty obvious, but what with fragmet shaders?
There is also need for that, for example I was writing a complete hardware MPEG decoder based on OpenGL. The main disadvantage, that degrades performance and quality of rendering was lack of ability to quickly perform precomputations that are used for next fragment computation. Fast IDCT can be performed in two separate passes, that could be achieved very fast by joinig them in shader stack.
Also other multipass operations like blurring (one blur shader on few layers) or shadow blending comes to mind, and could be done in single pass with that approach.
Please consider these propositions in upcoming “Mt. Evans” OpenGL release. I think it can be huge improvement, especially compared to actual DirectX10 specification.
Cheers
Wojtek