I recently made the assertion that automatic multipass was an impossible problem to solve in a general way. This is
not the sound-barrier or 4-minute-mile type of impossible, but 2+2=5 kind of impossible.
People seem to be assuming that a compiler can do what John Carmack has done for Doom 3. That is, take a single
expression of a lighting system (say written in GL2 shading language) and compile it to fit into several different
types of rendering paths. This includes breaking it into multiple passes.
I define multiple passes as sending vertexes and fragments through the pipeline several times with non-final passes
using results from previous passes (temporaries). The temporary values can be stored in the RGBA of the
framebuffer, the stencil, and rendered-to texture maps.
I define the problem of automatically multipassing as taking a vertex and fragment program P, which could be run in
a single pass on hardware with enough resources and breaking it into several sub programs which each use less
resources and have to be run in series. Each program (P1, P2, … PN) would be a pass. Each produces temporaries
which are then used by the next program until PN is used. PN outputs the final fragments.
I have to modify this definition, because I cannot honestly hold it up because I could be accused of strawman
tactics. It should be obvious that not all programs can be broken up into several chunks which run in series to the
end and then stop. If a branch in P5 references P1, then P1 would have to be reloaded. Now the driver can not even
tell you how many passes a program will take (think halting problem, and if that doesn’t convince you, realize at
least that you would have to tell the driver what the data will be, because the number of passes becomes data
Branching gives yet more obvious problems if every other vertex can branch in a different way. It is imaginable
that every single vertex in a mesh would require a different chunk of P to be loaded! So, at any time all the
vertexes in your mesh together may require the sum total of all the resources your shader will ever call upon,
therefore it would impossible to choose any one chunk of P (which uses less resources) to load to do even a single
pass on all vertexes if the shader cannot call upon all resources at once. What good does breaking a program apart
do if you may have to load it all at once anyway?
Some have suggested multipass per primitive, but the paragraph above suggests you need multipass per vertex. Does
that even make sense?
Maybe I cannot really define automatic multi-pass. I was hoping I could at least give a fair definition with an
actual meaning beyond ‘shader automatic multi-passing is when shaders automatically multi-pass’
Lets say we use a Quantum GPU to solve the problems above…
Can we really figure out what P1, P2, … PN should be? At first brush one might think that you could just take the
assembly and put context saving commands at the beginning and end. Like multitasking a CPU. But the closer
resource usage is together the smaller the chunks will be, at some point these chunks would become so small that a
much better route is to just forget doing it on a GPU and do it on the host CPU. Basically, I am saying here that
you cannot look at the assembly code in order to figure out how to break up P.
One would have to analyze the high level code of the shader in order to determine how whole statements can be
rewritten and chunked. Suddenly P1, P2, … PN are not straightforward pieces of the original program, but pieces
of P which have been rewritten, rearranged, and mangled, so that they can be run seperately and spit out
intermediate results which can then be used by the next chunk. Each chunk must use resources in a way that fits
into the GPU. Whats so hard about that? Well, I believe that analyzing a program this way is equivalent to writing
a program which will tell you if a another program will halt and give you an answer (that is impossible, so chunking
a program this way is impossible too). Even if someone could show me how to do this, the other reason I gave above
for saying its impossible still stands. How do you take these chunks and actually apply them to more than one
vertex at a time?
I wish a driver writer would chime in and defend me on this. Someone, cass I think, mentioned something which
sounded similar in that it mentioned Turing but did not go into detail.
If anyone feels like mentioning the Stanford Shader Language or the paper showing the dependent texture reads and
floating point buffers are all that are needed to implement renderman in OpenGL, let me remind you that these are
fundamentally different approaches to the problem of shading. People want shaders, compiled to opcodes, to be
broken into as few passes as the compiler can figure out (preferably as few as possible). The Stanford Shader
Language compiles shaders into predefined passes which act as the shading langauges opcodes. There is a big
difference between ‘compile my shader into a series of predefined passes’ and 'break my shader up into arbitary
passes each doing as much as it can’
My tentative conclusion is that auto-multi-pass is only possible if you have predefined pass types which you can
treat as opcodes for the shading language itself. That is approaching the problem from the other direction.
I would be the first to admit that I could be hugely wrong or sorely mistaken or both. But please, I am looking for
‘counter-proof by example’ here, which means that I’ll only change my mind if you can tell me an actual method by
which my concerns can be addressed.
The first thing I expect is for people to say that I have defined the problem completely wrong. But I do not think
I have. I have gleaned this from what people have said they want or how they expect such a system to behave. For
instance, a recent post suggested that one should be able to query how many passes a shader will take, while may
people seem to believe that this is possible. But, this is obviously a mistaken notion because one simply cannot
predict how many passes a shader will take unless one considers the vertex data and current state, and even then
most people with computer science degrees should know that you cannot predict the outcome of a program without
running it (even if its in your own head).
Wow, this is long. Who am I to set everyone straight anyway…