you don’t like displacement mapping?
What displacement mapping?
Real displacement mapping involves shifting the location of objects per-fragment. Doing it per-vertex is nothing more than some kind of hack.
In any case, vertex shaders can’t do the really hard part of displacement mapping anyway: the tesselation. And, if they’re smart, it never will (tesselation should go into a 3rd kind of program that feeds vertices to a vertex program, so that they can run async). So, in order to do automatic displacement mapping, you still have to do a render-to-vertex-array to tesselate it. Since you’re writing vertex data from your fragment program, you may as well use its texturing facilities to do the displacement.
Now, I do like the idea of binding arbiturary memory to a vertex shader. However, this is different from a texture access.
If you have a 16x1 texture, accessing the texel value 3.5 has some meaning. With bilinear filtering, that means accessing the blend of 1/2 of pixel 3 and 1/2 of pixel 4.
This has absolutely no meaning for the kind of memory I’m talking about. For example, let’s say I bind a buffer of memory that contains matrices for skinning to a vertex shader. The way this should work is that it only takes integer values as arguments. Matrix 3.5 has no meaning. And a blend of the 16-float values that matrix 3.5 represents would be the absolute wrong thing to do.
Also, textures are not updated frequently. And, when they are, they are usually updated via a render-to-texture or a copy texture function, not from main memory data. However, 9 times out of 10, memory bound to a vertex shader is updated every time the shader is used. So, you don’t want to use the texture accessing functionality with it; instead, you want an API more akin to VBO (you could even use a buffer object for the binding, since the API works so very well for the kinds of things you’ll try to do).
because in the end, vs and ps should and so could be implemented the very same way… that could mean for example in a fillrate intensive situation, you could use 6 of your 8 pipelines for the pixelshading, and 2 for the vertexshading… in a vertexintensive situation you could use 5 for vs, 3 for ps…
Yeah, this makes since. Especially considering how they are on very different ends of the pipeline. And they would be processing data fed from different places. And 1001 other major differences between vertex and fragment shaders that make this a horrible idea from a hardware implementation standpoint.
Plus, for optimal performance, you want to pipeline vertex shaders like a CPU: deep pipelining with a sequence of instructions all being processed at once. For a fragment shader, you want to pipeline like pixel pipes: wide pipelining, with multiple copies of the same instruction being called at the same time. Why?
Because vertex programs must operate sequentially. The setup unit has to get each vertex in turn. It does no good to spit out 2 or 3 vertices at once; indeed, this is incredibly bad for a short vertex shader (shorter than it takes the setup unit to process 2 or 3 verts). Also, it compilcates the setup logic, as it now must somehow know the order of these triangles. Each fragment of a single triangle, however, is completely independent of the others, so it makes since to do them in parallel.
Nailing down the ‘assembly’ language is bad.
Odd. Intel, apparently, thought that this was a very good idea (until recently with IA64, but AMD is taking up the reigns). Allow me to explain.
A CISC chip like most Intel chips works by emulating an instruction set. The P4 reads x86 instructions, converts them (using a program written in what they call ‘microcode’) into native instructions, and then executes those native instructions.
The thought behind this concept is so that you can compile programs to a single assembly language that can be run on multiple generations of a processor. Which is why code compiled for a 286 still runs on a Pentium 4 (to the extent that the OS’es allow it).
If you take the analogy to graphics cards, the card itself would be the underlying generation of hardware. The assembly would represent the x86 instruction set. The microcode is the assembler that runs when you create the program. So, really, SirKnight’s idea is nothing more than modern instruction set
Granted, ARB_vertex_program and ARB_fragment_program are not quite good enough to immortalize as a finalized ISA-equivalent. However, it is hardly fair to say that this idea is bad; after all, it is the basis of why your computer works today (unless you’re not using a PC).
You might say that graphics hardware is evolving faster than CPU’s did. However wanting to stick to the x86 ISA didn’t stop Intel from exposing MMX or SSE instructions; they were simply extensions to the ISA. Unless there is a forseable change that fundamentally alters how the assembly would look (outside of merely adding new opcodes), there isn’t really a problem.
However, there is one really good thing that comes out of glslang being part of drivers: shader linking. Supposedly, you can compile two vertex shaders and link them such that one shader will call functions in the other. In a sense, compiled shaders are like .obj files, and the fully linked program is like a .exe.
Of course, with a rich enough assembly spec (richer than either of the ARB extensions), you could still have this facility, where you would give the driver an array of shaders to compile together. The assembly would have to retain function names in some specified fashion. At that point, granted, nobody will want to write code in the assembly anymore, but that’s OK.
One of the very reasons for high level languages is the opportunity to eliminate diverse middle interfaces, to say goodbye to multiple codepaths, and still get the best possible hardware utilization.
So, why do you support glslang, when it clearly doesn’t offer this (as I have mentioned in other threads)? Outside of that library of your’s that you are writing, which has very little difference from Cg’s low-end profiles, ultimately.
[This message has been edited by Korval (edited 07-27-2003).]