Geometry programs

Zengar · July 4, 2005, 11:17am

Well, let’s discuss the need and sample implementation of a geometry “generating” programs - don’t know how they are called officially.

Do we need something like this? Can you propose some usage areas? Software implementation of such units would be rather Simple, I think. One can just take GLSL as base and define all glVertex* functions in it…

Some positive aspects I can see is possible speed-up of custom model generation, surface generation and shadow volumes construction(if anyone is going to use it anymore )

As T101 wished, I’m adding a list here. It sems a bit overcomplicated to me though Feel free to suggest edits

Possible uses:
[ul][li]Dynamic LOD - specifically thinking about terrain[/li] [li]Higher order surfaces[/li] [li]Shadow volume construction[/ul][/li][li]2. Options for placement:[/li][ul][li]A. Before vertex transform - this is more flexible, and may allow software emulation. [/li]But possibly problem with display lists - if either but not both are implemented in hardware.
[li]B. After vertex transform - can probably only operate on triangles and possibly quads, but[/li] can benefit from parallellism, and because of its limited functionality, simple to implement. But cannot be done in software without also performing the vertex transforms in software.
[/ul]
[li]3. Possible functionality:[/li] [ul][li]A1 Replace only evaluator functionality. Generate vertices for a single primitive type (probably trianglelist or quadlist), using current texture bindings.[/li] [li]A2 Replace both display list and evaluator functionality. Pretty much anything that can be called from a display list, including recursion. Advantages: highly flexible, possibility to do preprocessing on the CPU, display list functionality can be implemented using a geometry program, clearly defined relation to display lists, suitable primitives can be started for certain types of geometry (e.g. automatic use of strips or fans). Disadvantages: complex to implement in hardware, possible byte-order issues between GPU and CPU, high resource requirements, possibly memory management issues when binding textures.[/li] [li]B. (After vertex transform)[/li] Custom interpolation/warping in eye-space
[/ul]
[li]4. Input:[/li] [ul][li]A1 Vertex and attribute streams in object-space with attributes and parameters. Possibly one or more 1D/2D/3D/cubemap textures.[/li] [li]A2 Buffer objects - to be initialised at the CPU (both for custom structures and for compiled display lists if driver chooses to)[/li] [li]B. (After vertex transform) Vertex and attribute streams in eye-space with attributes and parameters. Possibly one or more 1D/2D/3D/cubemap textures.[/li] [/ul]
[li]5. Output:[/li] [ul][li]A1 Vertex stream in object-space[/li] [li]A2 Command and vertex stream in object-space[/li] [li]B. (After vertex transform) Edited eye-space vertex stream - possible additions/deletions.[/li] [/ul]
Korval: I’m not sure where writing to vertex attributes would come in, possibly both A1 and B.
[li]6. New constants/functions[/li] [ul][li]GL_PRIMITIVE_SHADER - Overmind’s suggestion for a primitive type to send vertices through the geometry program instead of using a standard primitive[/li] [li]glProcessGeometry(int first, int count) - Zengar’s suggestion for calling the geometry program after using vertex pointers to set up the streams[/li] [/ul]
[li]7. Proposed GLSL built-in types/functions/variables[/li] No doubt just a subset of what will actually be required, very preliminary,
and both V-man’s and Overmind’s suggestions are in here.
[ul][li]vertex - the vector as well as all the attributes. Note: What would be the maximum number and type of the attributes? Also: could this be defined as a simple fixed struct?[/li] [li]Variables:[/li] [li]vec4 gl_Vertex{123} (or a vec3) for sending the coordinates of one triangle (V-man’s suggestion - superceded by gl_Triangle[n] I think?)[/li] [li]gl_Triangle[n] Triangle structure. Contains data for triangles. Containing: int index1,index2,index3; indices into the vertex array/VBO. - Would that be the edited or original array? If edited, how do you keep track? If not, how do you insert?[/li] [li]vertex gl_Vertex[n] to address the input vertex stream - with n=0 for the first vertex of the “current” triangle.Note: how do you advance the index?[/li] [li]gl_TriangleCounter int to keep track of the (total?) triangle count[/li]
Functions:
[li]glBegin()[/li] [li]glEnd()[/li] [li]glNormal()[/li] [li]glVertex()[/li] [li]vertex interpolate(vertex v1, vertex v2, float amount) - interpolate between vertex attributes[/li] [li]void gl_Triangle(vertex v1, vertex v2, vertex v3) - emit a triangle with these three vertices (incl. attribs)[/li] [li]??? gl_ControlMesh(???) - Some way of accessing parameters to the geometry program. V-man, if this is important, please elaborate.[/li] [/ul]

T101 · July 5, 2005, 5:03am

You mean stuff like programmable terrain LOD, custom tesselation of curved surfaces?

Sounds interesting.

But, bearing in mind that that is something that has historically been handled solely on the CPU, requiring complex datastructures, what should the input to such a program be? Vertex arrays? Textures?

system · July 5, 2005, 6:27am

This could be used for higher order surfaces as well. The control mesh is sent to GL and in the tesselator, your code takes care of the rest.

In general terms, to output triangles I guess the tesselation program would have to write to built in uniforms.

for(…)
{
gl_Vertex1 = compute_this();
gl_Vertex2 = compute_this();
gl_Vertex3 = compute_this();
}

It requires more though.

Korval · July 5, 2005, 9:04am

In general terms, to output triangles I guess the tesselation program would have to write to built in uniforms.
It would output to the attributes, not the uniforms. It builds per-vertex attributes, and it is capable of accessing vertex data by itself to generate these attributes.

T101 · July 6, 2005, 4:18am

To answer my own question: the easiest and most flexible answer is probably just a block of memory. To be interpreted by your geometry program.
Probably with offsets to the start of the block instead of pointers - if something pointerlike is needed.

Byte ordering might be an issue though - on systems where the 3D hardware has a different ordering than the host CPU.

I suppose the geometry program would have to start primitives and queue up vertices for the vertex pipeline.

Binding (not uploading) textures would probably be a task needing to be performed by the geometry program. But that may mean binding fragment programs too. Speaking of a can of worms…

The alternative would be a requirement to split your objects by texture - something that people are doing already - but kind of restricting the use of this.
For example, if you have multiple levels of detail, you might not even want to bind a texture that is only used by small parts of the object.

For generating geometry, I doubt parallel processing of the same object makes sense.
Multiple objects in parallel might make sense but would probably screw up the states further down the pipeline - and you can’t keep queueing up the output from an object, because eventually the queue would be full.

I do suspect you’d need support for “infinite” loops, and long programs.
And of course it might not be worth the chip space.

Overmind · July 6, 2005, 5:31am

Wouldn’t a “primitive assembly” program make more sense? Ok, it wouldn’t be as general as a program that has an arbitrary memory block as input and a series of glVertex commands as output, but a GPU is no CPU

IMHO it doesn’t make sense to generate geometry really from scratch on the GPU, as this can’t benefit from parallelisation. Also, this would make vertex shaders redundant.

I’m thinking of a program that has a stream of vertices as input and a stream of triangles as output. This would replace the primitive assembly of the OpenGL fixed function pipeline.

The program would have to have the ability to access the last n vertices of the stream and it would need some mechanism to insert new vertices into the stream.

Something like this (mimicing the GL_TRIANGLE_STRIP functionality, pseudocode):

void main()
{
    gl_Triangle(gl_Vertex[-2], gl_Vertex[-1], gl_Vertex[0]);
}

Or mimicing GL_TRIANGLE:

void main()
{
    gl_Triangle(gl_Vertex[0], gl_Vertex[1], gl_Vertex[2]);
}

A positive index access would mean the i-th vertex of the current primitive, a negative index one of the previous vertices (meaning overlap between primitives). So a program accessing indices e.g. from -1 to 1 would be called every 2nd vertex, starting from the 3rd, with one vertex (the -1st) overlapping between primitives.

The program would also need the ability to generate new vertices:

void main()
{
    vertex v1, v2, v3;
    v1 = interpolate(gl_Vertex[-2], glVertex[-1], 0.5);
    v2 = interpolate(gl_Vertex[-1], glVertex[0], 0.5);
    v3 = interpolate(gl_Vertex[0], glVertex[-2], 0.5);

    glTriangle(gl_Vertex[-2], v1, v3);
    glTriangle(v1, gl_Vertex[-1], v2);
    glTriangle(v1, v2, v3);
    glTriangle(v3, v2, gl_Vertex[0]);
}

“interpolate” would simply interpolate all vertex attributes. If something different is desired, like replacing the position of the interpolated vertex (e.g. displacment mapping), this can easily be done with a vertex program and vertex textures. The program could still have access to the vertex attributes, this could be useful for dynamic lod, but generating new vertices by interpolating might be better for the hardware.

I think this would be a lot easier to implement in hardware, and it could benefit from parallel processing. And this would not be much less general purpose than a real geometry generating program.

In the extreme case of accessing all vertices in a single execution of the program it is equivalent to a geometry generating shader (when you combine it with a vertex shader), but then of course it can’t be executed in parellel. Of course the implementation could limit the maximum vertex count that is accessible by the program, so this method is both extensible and hardware friendly (not that I know much of hardware, feel free to correct me :rolleyes: ).

system · July 6, 2005, 5:41am

T101, it would not work that way or at least I envisioned it differently.

The tesselation program is executed for each patch (grid of control points).
There is no issue with offsets and byte order.
Your program evaluates triangles and spits them out.

You can access the control points as attributes of the tesselation program (gl_ControlMesh). Of course, there has to be hw limits here.

The triangles are outputted to the built-in variables of the tesselation program (gl_Triangle). There has to be a hw limit here as to how many you can generate.
I think a counter of trangles is also needed.

Maybe something like this

void compute_this_vertex()
{
.......
   gl_ControlMesh[..]
.......
}

void main()
{
   gl_Triangle[0].vertex[0] = compute_this_vertex;
   gl_Triangle[0].vertex[1] = compute_this_vertex;
   gl_Triangle[0].vertex[2] = compute_this_vertex;
   gl_Triangle[0].texcoord0[0] = compute_this_texcoord;
   gl_Triangle[0].texcoord0[1] = compute_this_texcoord;
   gl_Triangle[0].texcoord0[2] = compute_this_texcoord;

   gl_Triangle[1].vertex[0] = compute_this_vertex;
   gl_Triangle[1].vertex[1] = compute_this_vertex;
   gl_Triangle[1].vertex[2] = compute_this_vertex;
   gl_Triangle[1].texcoord0[0] = compute_this_texcoord;
   gl_Triangle[1].texcoord0[1] = compute_this_texcoord;
   gl_Triangle[1].texcoord0[2] = compute_this_texcoord;


   gl_TriangleCounter = 2;
}

Like I said, it requires thought and consideration for flexibility.

Overmind · July 6, 2005, 5:52am

Originally posted by V-man:
[b]
Your program evaluates triangles and spits them out.

You can access the control points as attributes of the tesselation program (gl_ControlMesh).[/b]
Wouldn’t the evaluation be possible in the vertex shader? This can already be done with today’s OpenGL functionality, with the only drawback that you have to send every vertex manually, instead of only the corners. The evaluation of parametric meshes is not the problem, this is already available. The tesselation is the problem.

My proposed solution adresses exactly this. I’m sure there are enough problems with it, but I think the concept makes sense

Zengar · July 6, 2005, 9:21am

Well, we may still use vertex arrays to feed such sort of a unit, but stick to generic attributes.

For example, something like this will emulate indexed vertex arrays

   attribute vec3 vertices;
   attribute vec3 normals;
   attribute ivec3 indices;  


   void main() {

   glBegin(GL_TRIANGLES);
   for(int i = 0; i < length(indices); i++)
    for(int n = 0; n<3; n++)
    {

     glNormal(normals[indices[i, n]]);
     glVertex(vertices[indices[i, n]); 
    }
   glEnd(); 

   }

I can imagine a lot of interesting things one can do with this… The main idea is to unload the CPU. Also, such program would write directly to GPU’s memory thus speading the whole process up. Tesselation etc. could be completely done on GPU.

system · July 6, 2005, 9:47am

Originally posted by Overmind:
[b] [quote]Originally posted by V-man:
[b]
Your program evaluates triangles and spits them out.

You can access the control points as attributes of the tesselation program (gl_ControlMesh).[/b]
Wouldn’t the evaluation be possible in the vertex shader? This can already be done with today’s OpenGL functionality, with the only drawback that you have to send every vertex manually, instead of only the corners. The evaluation of parametric meshes is not the problem, this is already available. The tesselation is the problem.

My proposed solution adresses exactly this. I’m sure there are enough problems with it, but I think the concept makes sense [/b][/QUOTE]No, it’s not possible with VS. A vertex shader processes a single vertex.

A TS’s job is to generate tris. You write your own algorithm and decide on the LOD in the program itself.
Notice the gl_Triangle structure that I used.
My example is outputing 2 triangles (assuming the hw can handle 2)

A primitive assembly program is also interesting, but there it has its own issues.
How do you select vertices to form your tris in a way that would make sense?

I guess you would have to accesss the index array.

If you think about doing LOD with a raw mesh data, it becomes complicated.
Evaluators naturally solve this problem.

Zengar · July 6, 2005, 10:09am

Originally posted by V-man:

A primitive assembly program is also interesting, but there it has its own issues.
How do you select vertices to form your tris in a way that would make sense?

How do you mean that?

Overmind · July 6, 2005, 1:49pm

Originally posted by V-man:
No, it’s not possible with VS. A vertex shader processes a single vertex.
That’s where you’re wrong.

A TS’s job is to generate tris.

Exactly what I’m talking about. Its job is to generate triangles, and to do this it also needs to insert additional vertices. But its job is not to transform vertices, that’s the job of a vertex shader (evaluators are just special non-linear transformations).

How do you select vertices to form your tris in a way that would make sense?

How do you select the vertices? The only difference between my proposed solution and yours is that mine generates triangles out of existing vertices or new vertices as combinations of old ones. These can be modified by a vertex shader if neccesary. Your solution computes all vertex attributes and then outputs triangles with these, thus leaving no work for the vertex shader. Functionally this is equivalent, but my solution still needs the vertex shader, while yours does not.

To clear something up: Indexed vertex arrays are not something I’d like to replace/implement by this shader. My proposal has as input the vertex stream, that is, with indexed access already resolved. But that’s not the point…

Notice the gl_Triangle structure that I used.

Notice the gl_Triangle builtin call that I used

These are really the same thing, just different syntax. But I don’t like the idea of being limited in the triangle count that I can output from a geometry shader. How would you draw a NURBS surface? What would be the input and how often would the program be called?

In my solution it would be the following:

uniform int STEPS;

void main()
{
    int i, j;
    vertex v0, v1, v2, v3;
    vertex h0, h1, h2, h3;

    v2 = gl_Vertex[0];
    v3 = gl_Vertex[1];
    for(i = 1; i <= STEPS; i++) {
        v0 = v2;
        v1 = v3;
        v2 = interpolate(gl_Vertex[0], gl_Vertex[2], i/STEPS);
        v3 = interpolate(gl_Vertex[1], gl_Vertex[3], i/STEPS);

        h2 = v0;
        h3 = v2
        for(j = 1; j <= STEPS; j++) {
            h0 = h2;
            h1 = h3;
            h2 = interpolate(v0, v1, j/STEPS);
            h3 = interpolate(v2, v3, j/STEPS);

            gl_Triangle(h0, h1, h2);
            gl_Triangle(h2, h1, h3);
        }
    }
}

(Perhaps this code could be simplified with a interpolate2D builtin. But that’s just syntactic sugar )

Together with the same vertex program as above (the control mesh could be submitted either as uniform or as vertex textures), and the following C program (setup code omitted):

glBegin(GL_PRIMITIVE_SHADER);
glVertex2f(0.0, 0.0);
glVertex2f(0.0, 1.0);
glVertex2f(1.0, 0.0);
glVertex2f(1.0, 1.0);
glEnd();

This would also make it possible to render only part of the surface, by submitting for example only the edge vertices (0, 0)-(0.5, 0.5), without change to the program. Or to draw a non-rectangular portion of the surface…

And dynamic LOD wouldn’t be a problem either. Just don’t generate a regular grid like I did in my example, but do something more sophisticated

T101 · July 6, 2005, 11:11pm

I’m not so sure that vertex shaders would become totally redundant. There is still tangents/binormals, and skinning that might be more efficiently done after the choice of vertices has been made - and that could still be performed in parallel.

By the way, wouldn’t a geometry program by definition have to generate coordinates in object space rather than camera space?

Like I said, it sounds interesting, and in my mind terrain rendering algorithms like ROAM on the GPU sounded like a possible application, especially since you can’t really use such detail reduction for tasks that have nothing to do with rendering (like collision detection). But it sounds like a hard task to move only part of a continuous LOD algorithm to the GPU.

I’ll admit that your suggestions are more practical though.

(PS: I’m currently working on terrain for my engine - can you tell…)

Korval · July 6, 2005, 11:26pm

That’s where you’re wrong.
No, vertex shaders do only operate on only one vertex. One vertex in, one vertex out.

All you’re doing in your NURBS implementation is sending lots of vertices, doing the tesselation on the CPU rather than the GPU.

Primitive processing, in whatever form, doesn’t make vertex shaders obselete. Vertex shaders will always be more efficient; as has been pointed out, it is much easier to parallize vertex shaders than arbitrary primitive processing from raw sources in memory.

The reason vertex shaders alone can’t do the job is because they only get one set of vertex data. What you suggest is having some kind of fixed-function tesselator that tesselates geometry in some way, and then decide what the positions should be in the shader. That doesn’t work because in order to determine what the tessellated attribute should be one needs access to all 3 sets of vertex attributes as well as barycentric interpolants for their particular vertex. And vertex shaders simply don’t have that information available.

Overmind · July 7, 2005, 12:09am

Look at my second NURBS implementation. I didn’t say the whole implementation is possible with vertex shaders, I said the evaluation part is possible. The tesselation is done in a geometry shader, the evaluation can be done in a vertex shader. No need to have evaluation in the geometry shader, as a few people already have pointed out, it’s more efficient in the vertex shader because of parallelisation…

Brolingstanz · July 7, 2005, 10:17am

i’d actually be more interested in the hardware side of things, the architecture. seems like if you simply subsume another stage in the fixed function pipleine (a reasonable course of action), the api should be rather easy to expose, as it was for vertex and pixel stages. the operations carried out by each stage are well defined in the spec.

the alternative i suppose is getting into an asics type scenario, where you could effectively extend feature sets by adding chips. as i understand it, this could get messy and expensive.

it seems to me that the real trick is going to be making the hardware, and making it cheap.

system · July 7, 2005, 5:15pm

How do you mean that?
Think about it. You have an array of vertices. You want to create a triangles from these. Which vertices to select?
With index based rendering, the index is your guide to forming triangles.
Or you use some other strategy, such as the vertex stream is just GL_TRIANGLES.

How would you draw a NURBS surface? What would be the input and how often would the program be called?
I think it would make sense if the TS would be executed once per control mesh.
The size of the control mesh would be limited by the hw.

In your example, you are doing some interpolation in your TS, then you say that my solution overtakes the vertex shader. Yours is no different.

I’m surprised no one mentioned the dimensions of the control mesh. I was expecting that someone was going to say something about the control mesh having to be a rectangle grid.
I think the control mesh should be a 1D array. How your program interprets the contents is up to you.

So if your control mesh is 3 points and you want to interpret it as a triangle that you wish to tesselate, that is possible.

T101 · July 7, 2005, 11:18pm

Well, the reason I didn’t suggest a rectangular grid of vertices is that it doesn’t make much sense.
It’s neither efficient - a 2D float (or even luminance) texture would be both more memory-efficient and could benefit from texture sampling hardware - nor is it suitable for more complex surfaces.

In my opinion a triangle stream is probably the most practical, due to the simplicity of working with triangles, though the “block of memory” would be the most flexible and would allow for more preprocessing.

Bonehead: I’m not sure what other plans exist with respect to geometry programs - I suppose there might be some plans in the DX world.
But it seems a bit early to think about how to implement something without having decided what it should do.
It seems natural to assume it will feed vertices into the pipeline, but how to interpret those vertices, what else it should do, and where it should reside relative to polygon splitting (quads into tris) and evaluators are all factors in that hardware implementation.
Needless to say, it does have to remain practical.

Overmind · July 7, 2005, 11:26pm

No, you completely misunderstood my proposal.

I create new vertices by interpolation, that’s analogous to the way fragments are created by interpolation out of vertices, just a few steps earlier in the pipeline. This is no functionality that is available in a vertex shader. You explicitly calculate per vertex attributes, this is already available in the vertex shader.

As for expressability, that is, what you can do with the program, I think my method + vertex shaders is exactly as powerful as your method without vertex shaders (ignoring performance, because performance considerations are pure speculation either way).

I said absolutely nothing about a control mesh, because the control mesh is outside the scope of this program. The concept of a control mesh is something that has nothing to do with the programmability, it is one particular aspect of one particular application. In my solution, when doing NURBS, you need a control mesh in the vertex shader, and there it can be accessed with any method you like, and have any form you like (uniform array, 1D-texture, 2D-texture, heck, even a 3D-texture or cubemap if you want and if you think this would make sense).

In one point you’re right though. The selection of vertices has really some room left for improvement. My proposed solution doesn’t really address this, I concentrated on the other end of the program, that is, generation of new vertices and triangles.

Zengar · July 8, 2005, 12:50am

V-man, look at my pseudoshader code.

I pass three streams in: an vertex position stream, normals stream and index stream. The index stream is being used to specify normals and vertices, this is glDrawElements unclosed.
My idea is that the geometry shader is the one that generates primitives. The program simply feeds some data. This data could be vertices, NURBS, other curves surfaces etc. Last but not least, one may input the radius and let the geometry generate a spere for him! The GPU unit would be a kind of general-processing unit with cash and fast vector arithmetics and shouldn’t take much space. Also, this unit doen’t have to be very fast.

The pipeline would be:

input data -> geometry shader ->(vertices)->vertex shader->rasterisation->fragment shader->thw whole rest

It would be pretty easy to do this in software while outputting primitives to a VBO, but the performance will of course suffer.