Vertex and fragment programs

I’d like to play around with vertex and particularly pixel shaders (GL_ARB_vertex_program and GL_ARB_fragment_program). Could anyone tell me where I can find some tutorials or good documentation on the ARB vertex and fragment programs in particular, please?

Also, is there any significant difference between using GL_ARB_vertex_program and GL_ARB_fragment_program to GL_NV_vertex_program and GL_NV_texture_shader (on nvidia cards)?

Yes there´s difference between the 2 modes .
They differ on internal precision that leads to differences in performance

if you have an NVIDIA chipset, a good place to start is NVIDIA’s OpenGL SDK (60 MB). there’s tons of demos with full source code included. lotsa vertex/fragment programs and most of the demos use GLUT.


There is almost no difference between ARB_vertex_program and NV_vertex_program, be it in performance or in functionality. Some may say that one is better than the other, but overall they’re plain equivalent.

ARB_fragment_program and NV_texture_shader are totally different extensions. ARB_fragment_program is a superset of NV_texture_shader functionalities (except fragment culling and depth replacement) and you can not really compare them. At best you could compare the NV_texture_shader+NV_register_combiners combo with ARB_fragment_program.

NV_texture_shader is supported by GeForce3 and GeForce4 Ti cards and above, and is NVIDIA-specific. ARB_fragment_program is supported by GeForceFX cards and Radeon9500 series and may also be supported by other vendors.

I’ve had a look through the nvidia sdk documents and I’m still not sure whether I can achieve the effect through the texture shader and register combiners.

What I’m trying to achieve is to render a scene so that for every input pixel, if the sum of the Red, Green, and Blue components is below a threshold, then the Alpha component is forced to zero. The ARGB is then used for blending with the frame buffer destination pixel as normal.
Is it possible to do this?

Yes you can do that with register combiners (don’t need texture shader).

To get the sum of Red, Green and Blue components, compute a dot product between (R,G,B) and (1,1,1). You may want to divide the result in order to fit in the [0,1] range.

Then for the threshold, use the MUX instruction which compares the alpha component of the spare0 register with 0.5

It should be done in two general combiner stages, so every GeForce card could do it.

You still can apply fog (or other stage) in the last combiner stage.

If you want your objects to be textured or multitextured, you may need more combiner stages, and then you would need a GeForce 3 or 4 Ti at least.

[This message has been edited by vincoof (edited 04-29-2003).]

That’s helped lots, but I’m still struggling; I’ve only got nvidia’s combiners.pdf (“GeForce 256 and Riva TNT Combiners - How to best utilize the per-pixel operations using OpenGL on nvidia graphics processors”) document to go by, which I find a little too brief.

From what I can tell, it looks as though I can achieve this effect in no less than four stages -

stage 0 //Modulate texture by base colour
RGB portion -
primary col RGB -> A
tex0 RGB -> B
AB output to primary col RGB //texture RGB modulated by base RGB
Alpha portion -
primary col Alpha -> A
tex0 Alpha -> B
AB output to primary col Alpha //texture alpha modulated by base alpha

stage 1 //Calculate modulated RGB’s R+G+B total (in 0 to 1 range)
RGB portion -
primary col RGB -> A
constant col 0 RGB (set to [0.33,0.33,0.33]) -> B
A.B (R+G+B in 0 to 1 range) output to texture0.B //Can’t output to tex0.A, used by the ‘mux’
Alpha portion -

stage 2 //Get the R+G+B value into texture0.Alpha
RGB portion -
Alpha portion -
texture0.Blue -> A //Can read the Blue component of the RGB, apparently
constant col 1.Alpha (1.0) -> B
AB output to texture0.Alpha

stage 3 //Do the threshold test: If R+G+B normalised total > 0.5, leave the alpha unchanged, otherwise set it to 0
RGB portion -
primary col Alpha -> A
constant col 1 Alpha (1.0) -> B
zero Alpha -> C
zero Alpha -> D
AB mux CD output to texture 1.Blue //Can’t output it to primary col.Alpha so this will have to do
Alpha portion -

Final combiner stage
primary col RGB -> D
texture 1.Blue -> G
zero -> A,B,C,E,F
Final RGB out = D = texture RGB modulated by base RGB
Final Alpha out = G = 0 or texture alpha modulated by base alpha, depending on R+G+B (0 to 1) > 0.5

Can I really do this in only two stages?
Can anyone tell me of any other documents going into further detail on the combiners?

If I get right, this operation does not need the mux instruction, and probably not even the register combiners functionality.

You want to discard fragments which alpha components are lower than a certain value, is that right ?

Also, when you tell “if the sum of the Red, Green, and Blue components is below a threshold, then the Alpha component is forced to zero”, what becomes the alpha component if the sum is not below the threshold ?

I want to discard fragments with an RGB total below a certain threshold, by forcing the alpha to 0 on all fragments with a low RGB. Those with a high enough R+G+B pass through unchanged, along with their unchanged Alpha.

I believe it can only be done with 3 combiner stages -
0: Sum texture RGB (write to spare0)
1: Write sum in spare0 to tex0.A
2: mux (using tex0.A) threshold test, resulting in either 0 or the original fragment alpha

However, In attempting to implement this, I’ve got a question -
How do I set the stage0 output to specifically set a particular register component (eg. Blue)? I want to do this -
spare0.B = A.B
where A = R,G,B
and B = 0.333,0.333,0.333
When using “glCombinerOutputNV(stage,portion,abOutput,cdOutput,sumOutput,scale,bias,abDotProduct,cdDotProduct,muxSum);”
I set ‘abOutput’ to GL_SPARE0_NV and ‘abDotProduct’ to GL_TRUE. What will this give me? Will I end up with -
spare0.R = R0.333 + G0.333 + B0.333
spare0.G = R
0.333 + G0.333 + B0.333
spare0.B = R0.333 + G0.333 + B*0.333
How can I get the dot product output to go to one component only (eg. spare0.B)?

With register combiners, you can not setup masks. The dot product will be written to all three components. But what is the problem ? You don’t use the R and G components, that’s all. It would not be faster to output to a single component anyway.

Thanks very much for the help. I think I understand it all now.
Having attempted to implement this, the result is not what I’d have expected. On closer inspection, I’ve noticed a contradiction in the nvidia documentation. One document says -
mux(AB,CD) = ( spare0.Alpha <= 0.5 )?AB : CD
the other says -
mux(AB,CD) = ( texture0.Alpha > 0.5 ) ? AB : CD

Which one is correct?

It appears to be -
mux(AB,CD) = ( spare0.Alpha <= 0.5 )?AB : CD
which is the correct version.
I’ve got it all working correctly now.

The MUX instruction is based on the alpha component of the spare0 register.
But because the spare registers are theorically not initialized for the first combiner stage, the alpha component of the spare0 register is initialized by the alpha component of the texture0 register, in order to ensure that spare0.alpha is always defined which ensures that the mux instruction is always defined. So, there exists a case where the mux instruction does use texture0.alpha : if you use the mux instruction in the first combiner stage.