I have to partially agree with matt here. While getting functions to automatically multi-pass on old hardware is a pretty thought, it is way less than practical today. I see no way to make a shading setup reliably take advantage of a geforce 4 while still scaling back to a TNT. Its just not reasonable because of the completely different set of functionality.
However, I think we need to (at some point) stop writing a dozen different shader paths for everything. As I see it, the hardware coming up (GL2/DX9 compiant) should be flexible enough to do most anything we need to throw at it for quite a while. That is where we need to begin to simplify things, from that point forward. We should be able to just throw a shader, have it calculate the result to a p-buffer if required, and render the final result to to the frame buffer as if it were a magical single pass.
As far as dealing with supported features, we could still use some sort of caps bit and select how each feature is to be implemented. Something like:
I need to take the log7 of each fragment (I’m just making this up, ok). I see the hardware supports logn (any base). Well, Ill just use that. No wait, it only supports log10. Then I want the hardware to calculate log10(fragment)/constant_value(log10(7)). Oh wait, you mean the hardware has no log support? OK, then I want it to look it up from this texture table using a dependant read. etc.
The point is, we would get the ability to make simple, confined decisions. I can do a simple:
if (supported(LOG_BASE_N)) apply(LOG_BASE_N)
else if (supported(LOG_BASE_10)) apply …
Each decision would be confined to a single feature. We would no longer have to worry about the combinatorial explosion that we do today. Today we say: oh, thats simple, Ill do this calculation instead…but now I need to break it into 2+ passes. So here is an optimized 4 texture version, a 6 texture version, an 8 texture version, a 4 texture version in case this other feature isnt supported and we have to break into 3 passes, etc.
So again, its still on us to decide how to do each little part of the equation, but how to get that equation into 1 or more passes is the driver’s responsibility.
I dont understand the invariance issues you are talking about matt. If I pass the hardware a single x/y/z with 25 sets of texture coordinates, the hardware should be able to keep coming up with the same fragment-z for every pass. I understand some hardware isn’t invariant between 2 different render setups, but I think maybe we need to progress to the point where we say it does have to do that, and the IHV’s make it happen. I personally think its kinda ridiculous how for some cards, even making some stupid little render setup change means you lose invariance. Maybe there is a very real reason why hardware does it that way, but “because thats the way it has to be done” isnt a good enough excuse for me. Make the hardware so that it doesnt have to be done.
So maybe GL2/DX9 cards wont have all the features necessary to do this, but we need to make it a top priority to say that in the next couple of revisions, we need to get these features in so we CAN do this for the long run.