The end of vertex programs?

With the release of NV_vertex_program_2, I would like to submit for discussion the notion that vertex programs may be dead.

What I mean by that is that they are feature-complete. They already have exp, log, sin, cos, etc. They’ve got loops and conditional branches too. Really, the only improvents to be had are either performance enhancements or more registers/opcodes (input, output, temporary, or constant).

Actually, now that I think about it, there is one, rather frightening, more thing a VP can have: Direct Memory Access. I don’t like this idea, personally, as it seems to violate some basic ideas about what vertex programs should, and should not, do. I, personally, feel that vertex programs should be given a specific set of data to operate on.

Instead, I would propose the following: rather than giving vertex programs free reign at memory, let’s create a new type of program: the command program.

The command program is given pointers to the various arrays (with strides and so forth), and is given a pointer to indices. It may also be given pointers to other information (at the user’s control). The command program has bound to it one (or possibly more, but that would be even harder to implement in hardware) vertex programs that it can call to send verts along the pipe. When it calls a vertex program, it gives it a pointer to what may well be a register file that contains the constant registers. As arguements, it sends per-vertex attributes. It is important to note that the command program, also, has direct control over how the system handles the vertex data (strips, fans, lists, etc).

A standard command program will loop over the number of indices for the primative type, run a vertex program on the data for each index, and finish. A more complex command program may be given a displacement map and start tesselating the vertex data. It could perform NURBS tesselation in hardware. A command program could even pick matrices out of a large set for skinning purposes.

In order for command programs to be useful, they must be both powerful and reasonably fast. It will have to be able to have a temporary memory buffer, so that it can properly play with vertex data. But, aside from this, it gives the user the ability to tinker with vertex data and create new polys while still not making vertex programs do something that they probably shouldn’t be involved in.

What about the notion of lookup tables in vertex programs (aka textures)?

You could slap whatever you want into a lookup table, bind it to the vertex processor and then do something like

for (int i=0;i<vertices_in_mesh;++i)

Then you use your 1d int vertex ‘coordinates’ as offsets into your lookup tables and start working from there in the vertex program.
Would give all the power you could want I guess

The possibility of one vertex in, multiple vertices out sounds like something that could be interesting. Or even getting to the point where fragment programs can generate new vertices (of course, you need conditional and loop statements to get added to fragment programs)

I like the idea about the vertex program sampling a texture as well.


The point I was trying to make is that such functionality belongs in a different paradigm. Vertex programs should still be one-vert-in, one-vert-out; they should remain fairly simple (so that they remain fast). A more complicated program can run at the level above the vertex program, and it gets access to more information (like the entire vertex array, the entire index array, etc).

I’m simply saying that vertex programs don’t need to go off into memory. Since the user may want this functionality, give it to him, but put it somewhere else. It is simply a matter of putting functionality in the place where it most belongs.

Oh, and I’m very much against fragment programs ever generating vertices. This just doesn’t make sense. That’s the kinda useless functionality that someone wants when they are trying to do something that doesn’t have anything really to do with rendering a scene (which is what all of these things are for, btw). I’m very much against putting features into hardware for simple novelty use. If something’s going in, there had better be a real-world, practical, rendering use for it.

I can come up with real-world uses for the command program and that kind of functionality. I can’t come up with one for fragment programs spawning vertices directly. Besides, if you need that functionality, now you can render to a buffer and use that as a vertex array. Nor can I come up with a situation where allowing vertex programs to spawn vertices does anything more than command programs cannot (and command programs are more flexible, as they actually do the vertex fetches from the arrays themselves).

I was thinking along the lines of the ocean and waves of water. Water is tranparent, right? But as water gets deeper (and denser), your visibility of the ocean floor diminishes. Fog is one way to accomplish it, but it’s a bad, fake way of doing it. Shaders are meant for not having to fake as much anymore to get things to look more the way they are supposed to. Water also has alot of particles in it (plankton, plants, toxic waste, you name it) things that “cloud up” the water. So you need some ‘noise’ for varying densities of your water by these particles. Ocean water is never standing still. So you need more noise for the direction that different layers of water are moving (undertow, surface, low waves, waves crashing down) those sort of movements. The movement of water also affects the density and ‘cloudiness’ things in the water. When you start getting multiple volume (3D) textures and noise maps (2D and/or 3D) that need to be generated each frame (chances are by a fragment program), you would probably need the fragments accessable to both the vertex and the fragment programs that will generate the final vertices and colors. I don’t see 32x32x32 vectors (32k or more) becoming available resources to a vertex program at one time any time soon, which is why a quick texture lookup (nearest would probably be fine, nothing fancy) would be nice.

I agree with you that it would probably complicate a lot of things beyond what they need to be. And there might be other ways of getting that volumetric data into the vertex program other than a texture, but that was my idea for wanting it, making a really cool looking ocean without the cheezy fog.

Fragment programs making additional vertices; I was thinking along the lines of displacement mapping and particle generators, I haven’t personally found a reason why I would need it. Other reasons might involve stuff like medical imaging applications, but I don’t know, just speculating. (Hey, I could make that frothy stuff from the water [just messing with you ])

I will say that the command program would make it easier to run data through, but I’m thinking there might be problems with it (not conceptually, just implementation wise)? Instead of making individual calls for each dataset for each program, I think you would just be moving one of the CPU’s problem onto the GPU, that being functional overhead (and if not stored in VAR/VAO, data fetching as well). I don’t know how a GPU would handle that sort of thing, it would certainly require larger instruction and data caches on the GPU. Plus, I think there is a state change that happens in OpenGL when a new geometry type is being rendered, and when glBegin and glEnd are called, so you have that do deal with as well. And, you might need a way of knowing when the rending has finished, or when the vertices have finished being processed (aka. fence?). I don’t think you would want to send more data and instructions to the GPU if it is still processing the previous set (unless it’s going to cache it all). On the vertex programs, that includes all the constants and matrices and stuff too, right, or are those seperate? Should I go to bed now that it is 12:40am, or keep mumbling about nothing

If they make, I will bite


texture lookups will be in in some of the next version. in hw you get it on every hw with displacementmapping yet today, so its just a mather of time till this gets programable as well…

for correct shading of volumetric objects, just use raytracing/raycasting… preferably in the hw…

i think the p10 will be one of the next distant gpu’s we want to get at, its very programable… (but you can’t touch this actually… but they got displacementmapping in hw in, after they sent out the card! just with the help of the programability… cute, not?)

What you’re speaking of is more commonly referred as primitive program. It could be used to do subdivision surfaces, to generate shadow volumes or to do custom(depth adaptive for instance) tessalations. Sometimes it would be usefull if the primitive program comes after the vertex program. For instance you can use the vertex program to apply displacement mapping first and then a primitive program to calculate the shadow volume.

I have been advocating a programmable vertex “generator” for a while. I think it’ll take a bit before we get there, but that’s the obvious future extension. Currently, the general-purpose CPU serves that function, although I believe enough simplifications and optimizations can be made to justify this fourth programming model.

It would be nice if regular vertex programs could read look-up tables (nee “textures”), and that seems to be on the way in some future generation.

Regarding indirect addressing/very-large register files, you could conceivably cache only a part of it on the card, letting the rest sit in RAM. That would let you support (tens of ) thousands of registers, if you wanted, at a certain complexity price.

I believe the current belief is that you shouldn’t be sending data if you’re not gonna use it, so an efficient transfer from RAM to on-board register file is faster. I’m wondering at what point that will change…

207 bone skeletons, anyone?

[This message has been edited by jwatte (edited 09-29-2002).]

what about hrdware noise? i think it is a good thing to implement in hrdware, programmable noise. it can be done with pixel shaders, but it is still too slow. a noise completely in hardware that runs fast can bring a big change to real-time graphics.

Originally posted by jwatte:
I have been advocating a programmable vertex “generator” for a while. I think it’ll take a bit before we get there, but that’s the obvious future extension. Currently, the general-purpose CPU serves that function, although I believe enough simplifications and optimizations can be made to justify this fourth programming model.

I don’t understand why they just don’t move to completely programmable DSP’s of some sort with massive parallelism… You would basically code the on card DSP’s the same way you program the onboard CPU…

Instead of having seperate pipes for primitive/vertex/fragments, just move to 1024 general purpose DSP’s, allocate maybe 16 to primitive processing, 64 to vertex processing and the rest to fragment processing. Allows the work load to be finely tuned…

And most likely incredibly hard to implement in HW… But this is probably how the P10 works anyway… :slight_smile: