future of shaders

Zengar · November 28, 2006, 11:52pm

I actually suggest moving to a sligtly different shader model, with global (shader-independent) uniforms and attributes. One could simply upload shaders and functions to the shader system and they will be accessible by name. This way, you could define your custom texture function that would be accessible from each possible shader independently. But something like this is already possible in GLSL by shader linking.

I am not sure that what I called “vertex source shader” could be of much help but this would be a consequent step after vertex and fragment shaders. Basically, I imagine this unit to be similar to the geometry shader unit, but: it will operate prior to the vertex unit and it will be more general purpose. Imagine something like this:

  
   buffer vertexData;
   buffer indexData;
   buffer texData;

   uniform int vertexCount;
   uniform int indexOffset; 

   void main() 
   {
    for (i=0; i<vertexCount; i++)
    {
      int index = readBufferInt(indexData, i*sizeof(Int)) + indexOffset;
      
      g_Position = readBufferVec3(vertexData, index*size0f(vec3));
      g_TexCoord = readBufferVec2(texData, index*sizeOf(vec2));

      EmitVertex();
    }

   }

jide · November 29, 2006, 1:57am

So you could do things like this ?

...
vertex_input = readBufferVec3(vertexData, index*size0f(vec3));

if vertex_input == vec3(0,0,0)
   vertex_input = vec3(0,1,0);
...

g_Position = vertex_input;

Well I guess it will really be slow. More, it might disable some cache abilities. And concrete uses are really undeterminated.

I’m neither not sure global naming would be of help. This will oblige users to specify unique names for all shaders and could provoque unexpected results due to undesired same names or undesired different names as the shaders of a project grows up, or as a project adds shaders from other projects…

Zengar · November 29, 2006, 6:32am

It’s true, it will be slow Well, actually, that is not the point. Such unit has no need for general-purpose command set. It even does not need to be a shader. What I want is the more flexible alternative to vertex arrays, something that will allow several levels of indirections (=instancing/multi index) etc.

The problem with global names can be solved the same way it was solved in every-day programming. One could introduce namespaces or something like that. I think the advantages are greater, as the whole model becomes more intuitive. I don’t understand why I have to give each shader it’s own copy of a variable that all share the same name and the same semantics. On the other hand, it may be very confusing to have several shaders that use uniforms with same names but different semantics. A global model would restrict such problems.

Korval · November 29, 2006, 7:34am

Why would the hardware cache/optimize a texture shader better than a fragment shader?
Well, it kinda depends on what the texture shader would be able to do.

A fragment program is fundamentally prevented from accessing its neighbors, yet texture coordinate accessors do this all the time. If a procedural texture shader were asked to generate a full pixel-quad worth of textures (or somehow compiled to do so, thus providing a better abstraction), then you could have a performance advantage over a fragment program.

Alternatively, if a texture shader could actually store and retrieve data between invocations, that would be good too.

I don’t understand why I have to give each shader it’s own copy of a variable that all share the same name and the same semantics.
Um, you don’t. If it truly refers to the same construct, then it’s the same variable, and the linker will link them into one variable in the linking stage. And if they’re truly not the same, then they shouldn’t have the same name.

And if you were referring to the general use of the term “shader” as in what glslang calls a “program”, it works there too. One program cannot affect another.

That is not to say that I wouldn’t mind some namespace action. To be fair though, shaders are still incredibly small compared to real C programs. And even C99 doesn’t have namespace support. So I don’t think it’s that big of a deal.

zeoverlord · November 29, 2006, 11:33pm

Originally posted by Korval:
[b] [quote]Why would the hardware cache/optimize a texture shader better than a fragment shader?
Well, it kinda depends on what the texture shader would be able to do.

A fragment program is fundamentally prevented from accessing its neighbors, yet texture coordinate accessors do this all the time. If a procedural texture shader were asked to generate a full pixel-quad worth of textures (or somehow compiled to do so, thus providing a better abstraction), then you could have a performance advantage over a fragment program.

Alternatively, if a texture shader could actually store and retrieve data between invocations, that would be good too.
[/b][/QUOTE]I don’t think any specific texture shader would increase performance significantly, mostly because we now have (with the GF8800) unified shader processors so any special optimization a texture shader would get, the fragment shader also automatically gets.
So i don’t think it’s that justified to add a shader if the same thing can be made with another already existing one.

Nah, looking at the graphics pipeline there are currently only 3 places to put new and distinct shaders (which job can’t be filled by another one).

in the beginning: a object shader or a sort of programable instancing shader of some sorts (though kind of redundant).
in the middle: a rasterising shader, i don’t know how this would work, but there is a spot right there to fill with a shader, maybe for early z and alpha testing.
in the end: the blend shader, i can really see a significant use for this, not only because it would allow for things like 3 color alpha blending and maybe order independent transparency but also make a lot more GPGPU things possible.

So if you really want more then you have to put it outside the traditional graphics pipeline (raytracing shaders anyone).

Komat · November 30, 2006, 12:09am

Originally posted by zeoverlord:
I don’t think any specific texture shader would increase performance significantly, mostly because we now have (with the GF8800) unified shader processors so any special optimization a texture shader would get, the fragment shader also automatically gets.

That all the shaders are run using the same hw does not mean that there can not be special optimalizations for situtions when specific shader type is processed that are not available for different type (e.g. only the fragment shaders can use texture functions with automatic texture mipmap level selections because they have the screen-space derivatives. The price for such capability is reduced performance of dynamic branches in fragment shaders).

For example output of such texture shader might be cached and shared between several pixels in similiar way ordinary textures are currently cached.

So i don’t think it’s that justified to add a shader if the same thing can be made with another already existing one.

One thing is if it can be done and second thing how effective and clumsy that implementations is. The vertex/fragment/geometry shaders represent algorithm running on some data. The texture shaders are more data-like in similiar way the ordinary textures are and like them it would be impractical if you need to have one shader program for each combination of texture objects.

Overmind · November 30, 2006, 2:20am

That justifies a different implementation, but not a different shader.

If the hardware could do something like a “texture shader”, and the user writes a fragment shader (without main, just a function for calculating a procedural texture), then the driver could just transparently use the “texture shader” capability instead of the standard fragment shader.

I just don’t see any syntactical difference between a second fragment shader that’s linked in the program and this “texture shader”. If the hardware can optimize it better, ok, go ahead and optimize it. I’m sure the driver can decide itself when it can optimize certain things and when not. We don’t need a new shader type for this.

Komat · November 30, 2006, 2:29am

Originally posted by Overmind:

I just don’t see any syntactical difference between a second fragment shader that’s linked in the program and this “texture shader”.

To be usefull that texture shader would need to operate like textures and be bound to samplers instead of being linked to program. It would be like texture object that would contain shader code instead of texel data. Something like:

ts = glCompileTextureShader( source_code ) ;
glBindTexture( GL_TEXTURE2D, texture_foo )
glTexShader( GL_TEXTURE2D, ts ) ;

Overmind · November 30, 2006, 2:42am

Of course. Perhaps I should have said “relevant” differece. It really doesn’t matter wheather I have to call texture2D or a custom function in the main shader.

Komat · November 30, 2006, 3:17am

Originally posted by Overmind:
Of course. Perhaps I should have said “relevant” differece. It really doesn’t matter wheather I have to call texture2D or a custom function in the main shader.
The relevant difference (at least for me) is that the you can have say 100 textures utilizing different texture shaders and use them all with one linked program object by simply binding different texture without need to link another program. Of course you can link program for each possible combination however you might end with several thousands shader programs.

EDIT: This might be biased by the fact that I do not like the way how the GLSL links combination of vertex and fragment shader to one program. I consider the old assembly interface a much better one.

zeoverlord · November 30, 2006, 4:47am

Originally posted by Komat:
The relevant difference (at least for me) is that the you can have say 100 textures utilizing different texture shaders and use them all with one linked program object by simply binding different texture without need to link another program. Of course you can link program for each possible combination however you might end with several thousands shader programs.
You don’t have to end up with several thousands of shader programs, that’s what uniforms and texture coordinates are for, i normally only use at most a few shader programs, but i may have any number of textures to choose from, it’s all in how you write them.

Hampel · November 30, 2006, 6:17am

What about the following scenario: Define a shader that combines 3 color values in a certain way. All 3 colors should be read from textures or could be constant colors.

You would have to create a huge number of shaders to allow all possible combinations for sourcing each of the 3 colors from different sources: ConstantColor, Texture1D, Texture2D, TextureCubemap, …

So what you end up with, is to write a Meta-Shader-System or a Shader-Management-System, which allows to define shader fragments and to combine them in a more orthogonal way.

I would like to see a standard for such a system on top of GLSL to get a better interoperability with other systems…

Komat · November 30, 2006, 6:23am

Originally posted by zeoverlord:
You don’t have to end up with several thousands of shader programs, that’s what uniforms and texture coordinates are for, i normally only use at most a few shader programs, but i may have any number of textures to choose from, it’s all in how you write them.
And how much you trust the driver that it will optimize out unused code and will optimize program variants for various combinations of uniforms. Especially if code to calculate all variants simultaneously will not fit into hw instruction count limits.

I am talking about procedural textures of different types where calculation for each type might be a complex one so if the driver decides that it will calculate all of them and select the correct one, it might be a problem.

Overmind · November 30, 2006, 6:37am

You just write a single shader that combines the return value of three functions. Then you write different implementations for these functions.

You still have the problem that we lack some sort of virtual function call mechanism, so you still have to compile each implementation three times. But that’s hardly a “huge number” of shaders.

The problem you describe should be fixed at it’s source, by providing some sort of “function-as-value” or “virtual call” mechanism for GLSL, not by adding another shader type that fixes only a small part of the real problem.

EDIT: too slow
My post was in reply to Hampel.

Korval · November 30, 2006, 8:36am

The problem you describe should be fixed at it’s source, by providing some sort of “function-as-value” or “virtual call” mechanism for GLSL, not by adding another shader type that fixes only a small part of the real problem.
No. Absolutely not.

At no time should shaders start having the ability to pass functions around. Shader logic is similar than CPU logic because, among other things, it treats the source code differently from the memory data. It has a structured memory layout and structured source code.

It would be valuable to the user to start throwing function pointers around, but the sacrifices in terms of shader performance… I’d rather not make that tradeoff.

dReddi77 · November 30, 2006, 9:35am

If I can add a few words…

I would see one global shader that can handle the
entire pipeline, from loading vertexes to putting
value into the color,stencil,depth buffers. The
shader that can access all buffers, perform custom
texture filtering, stencil, alpha functions etc.

Now we have shaders in exact places inside the
pipeline that have very limited job to perform

Overmind · November 30, 2006, 10:38am

It would be valuable to the user to start throwing function pointers around, but the sacrifices in terms of shader performance…
I didn’t mean function pointers. Just some means to solve the following problem:

Let’s say I have a main shader that calls two abstract functions “vec4 A()” and “vec4 B()”. Now I would like to provide some implementations for these two functions.

This can currently be done, but only seperately. If I want the same implementation for function A and function B, I have to write two different shaders. So I end up duplicating code, while the only difference is the function name.

The solution to this problem needs no function pointers, it can be statically resolved by the linker. It doesn’t need any new hardware functionality at all, just some way to say:

Function C, defined in another shader that’s going to be linked with the main one, is the implementation of the abstract function A, and function D, defined in yet another shader, is the implementation of B. That would make functions with the same signature interchangable. I still need the same amount of different program objects, but a lot less shaders.

This should not have any negative performance implications, since the driver can still duplicate or inline the code at link time. On the contrary, the driver may decide if the code should be duplicated or not. More decisions the driver can take can only be good for performance.

jide · November 30, 2006, 11:35pm

As we’re discussing similar things, it wasn’t the main subject of this topic but as the title was a bit large…

Shaders (at least on my geforce fx) tend to be incredibly slow as soon as I use functions and conditional and loops (what you might call dynamic branchings ?). Are all of these going to be (or already on more recent hardware) supported ? Is it an hardware limitation, just a software one ?
What I’d like to say is that it would really be appreciable if I could write shaders just like I write C programs: functions tend not to slow things down as for conditionals and loops.
Can we really expect GPUs to look more like CPUs in their runtime logic operations ? Or is it purely a dream ?

Also, what about libraries (providing the previous statements are available) ? I just can imagine to load a ‘library’ at runtime, but that would effectively upload only the used code (in order to avoid memory wastes). Indeed I more foresee to use for example FixedPipelineLighting that belongs to a more large library, and for example, switch it to PerFragmentLighting on the next runtime (or even during the program live).
I know this can already be doable, but I find that mechanisms for this do not suit really well.

More decisions the driver can take can only be good for performance.

I’m not sure of that point in a general way but that’s pretty true if it’s at compilation or linking time, not during runtime.

…

Reducing the amount of shaders would really be valuable, and I’m sure it worths to do so. This might lead to what I called ‘library’, maybe in a clumsy way because it can be different from standard C libraries.

ScottManDeath · December 1, 2006, 9:10pm

Originally posted by Overmind:
[b] [quote]It would be valuable to the user to start throwing function pointers around, but the sacrifices in terms of shader performance…
I didn’t mean function pointers. Just some means to solve the following problem:

Let’s say I have a main shader that calls two abstract functions “vec4 A()” and “vec4 B()”. Now I would like to provide some implementations for these two functions.

This can currently be done, but only seperately. If I want the same implementation for function A and function B, I have to write two different shaders. So I end up duplicating code, while the only difference is the function name.

The solution to this problem needs no function pointers, it can be statically resolved by the linker. It doesn’t need any new hardware functionality at all, just some way to say:

Function C, defined in another shader that’s going to be linked with the main one, is the implementation of the abstract function A, and function D, defined in yet another shader, is the implementation of B. That would make functions with the same signature interchangable. I still need the same amount of different program objects, but a lot less shaders.

This should not have any negative performance implications, since the driver can still duplicate or inline the code at link time. On the contrary, the driver may decide if the code should be duplicated or not. More decisions the driver can take can only be good for performance. [/b][/QUOTE]CG supports interfaces for exactly that use case you describe…