Programmable hardware state managment

This is a non-urgent question for those of you who use vertex programs and register combiners in large projects. How do you keep the number of programs from growing exponentially?

For instance, with normal OpenGL, you have nearly infinite variety in setting up the OGL state machine by making many little function calls. When you use vertex programs, though, you have to write a separate program depending on whether you need one light, two lights, distance fog, height fog, diffuse only bump mapping, diffuse and specular bump mapping, diffuse bump mapping with distance fog, no bump mapping with height fog, etc.

So how do you deal with all the permutations of possible states? Do you call functions to concatenate strings to make the appropriate program? Always use some slow uber-program and send parameters to disable the parts you don’t need?

Even if you don’t have “the” solution, I’d be interested in starting a discussion of the topic.


Well what i have been doing is making separate vertex programs for what ever i need to do at a particular point in the program. But i thought about the same thing you are now before. If games like the new Doom uses seperate programs then there must be zillions. I like your idea of concatenating strings together to generate programs though.


The concatenation approach is one that ATi’s extensions support much better than nVidia’s, since the programs are defined by function calls rather than string opcodes. But, you can always turn the opcode approach into the function call approach (and vice-versa). It just requires a bit of work.

shader PerPixelDiffuse
shader ShadowVolumeExpansion
shader Sky
shader Grass

its not exponensially… just use the SAME shaders for all the stuff

just use the SAME shaders for all the stuff

Yea yea, just hope no-one wants to shine a flashlight on your grass

– Zeno

A struct describing what a shader is supposed to do.

A per-hardware implementation of “compile struct to shader object”.

A cache that caches the last N used shader objects, and/or shaders which are marked “locked”, and/or shaders which have been used during the last M frames.

The per-hardware implementation will concatenate strings for nVIDIA, and call functions in if/switch statements for ATI.

Jwatte -

Very interesting. Tell me if I’m understanding you correctly:

You might have things like a point-light class or a linear fog class (probably derived from some base-class with a virtual compile function). Then, you could set up some outside object that can hold a linked list (?) to these structures and “compile” a list of them into a shader string?

How would you deal with passing the results from one object to the next? Always use a few register ranges for input and a few for output? Wouldn’t this give you lots of illegal ways to link things up?

Sorry for all the questions, but I’m really interested in this idea


This thread kinda got lost in all the OT poop.
I hope someone can answer Zeno’s questions.


How To Ask Questions The Smart Way

No, it’s more pedestrian than that. Basically, the struct is really big, and contains the superset of everything you might want to do in a shader. Then the “compiler” looks at which “active” flags are actually set.

The registers that are in/out are, in the basic version, simply fixed for each stage, although you can easily do some register coloring and/or numbering compaction just to make it challenging (well, the compaction is not very challenging). Unless you have an amazing appetite, you should be able to make do with just fixed assignments.

And, yes, if you compile a shader which requires, say, tangent-space basis vectors, and don’t bind a tangent-space basis vector as input when you render, it won’t work. Ideally you catch this at a higher level :slight_smile:

A group at stanford did some work on something similar a while ago – they designed a hardware-independent shader language which was then compiled when necessary for various platforms. For example, a particular scene they use with bowling balls was constructed in 8 passes on a GF2 and 2 on the GF3. Pretty cool tech, if you ask me.

Thanks for the bump Pk.

Jwatte - I see. It’s basically the same thing, whether there is a struct per block or one struct containing multiple blocks. You just want to chain them together according to the current state.

After more thinking, it would be easy to chain things together…each time you add a block you could give two integer arguments that would designate the input and output (if necessary) registers.

Anyway, haven’t we been on topic long enough? Jwatte, are those your rants at

– Zeno

[This message has been edited by Zeno (edited 03-30-2002).]

Originally posted by thewizard75:
A group at stanford did some work on something similar a while ago – they designed a hardware-independent shader language which was then compiled when necessary for various platforms.

Skimming through the 2001 Siggraph Proceedings, I think you’re talking about:
A Real-Time Procedural Shading System for Programmable Graphics Hardware
Roudfoot, K. Mark, W. Tzvetkov, S. Hanrahan, P.

Tzvetkov is an nVidia guy, the rest are from Stanford. I haven’t went over the paper in great deal, but it looks like cool stuff. I wonder how difficult it would be to integrate their shading language into a personal project. The link to their webpage is here:
Does anyone have any experience with this system?

Anyway, haven’t we been on topic long enough?

I think I’ve had my fill of OT poop. Thanks anyway.


How To Ask Questions The Smart Way

[This message has been edited by PK (edited 03-30-2002).]

Pk - Their work is really great stuff…I’d love to be able to do my programming with their shader language.

Unfortunately, though, the last time I tried their demo, several states crashed my computer when rendering. I figured it’s probably not ready for prime time yet, and haven’t tried integrating it with any projects. Maybe it’s time I give it another try

– Zeno

Also, it’s currently only usable with immediate rendering, vertex arrays are not possible.

And that makes it unusable for now… (too slow) too bad, because I really like it.

Are they actually still working on it?