ARB_Fragment_Shader

Humus · September 23, 2002, 12:50am

Originally posted by vshader:
btw, following ATI’s lead, how about GL_NV_text_textureshader_registercombiners(using the new ARB program interface)?

I think that’s a good idea too. That would allow me to clean up a little in my framework.

davepermen · September 23, 2002, 2:00am

thats about the reverse cg way:
first, make a good base,
then, make as much as possible work on this base, even where there is no full support…

i hope nvidia will create some GL_NV_DX_text_pixel_shader1_3 in… would be very cool…

oh, and a GL_NV_register_combiners would be cool, too… for gf1 and gf2, ya know…

they could do that easily, internally they only need to call nvparse

looking for a bright future…

MZ1 · September 23, 2002, 6:20am

oh please, no nvparse syntax, just good old asm

davepermen · September 23, 2002, 8:18am

Originally posted by MZ:
oh please, no nvparse syntax, just good old asm

a) i prefer the syntax for the register combiners as it is more natural to the combiners, more clear how to use them exactly (rc hw design sucks )

b) GL_NV_DX_text_pixel_shader1_3 was the name i gave it… guess what syntax i would like for gf3 and gf4?`!? yes. there is a DX in, and yes, there is pixelshader1.3 in…

imported_fresh · September 23, 2002, 10:23am

THANK YOU GOD

vshader · September 24, 2002, 4:18am

Originally posted by davepermen:
GL_NV_DX_text_pixel_shader1_3 was the name i gave it… guess what syntax i would like for gf3 and gf4?`!? yes. there is a DX in, and yes, there is pixelshader1.3 in…

hate to burst your bubble, but considering microsoft’s IP claim on ARBvp (which looks nothing like DX vertex shaders now except a few of the ops and their 3-letter codes), do you think they’re gonna allow GL_NV_DX_text_pixel_shader_1_3?

i agree its a good idea to combine the texture shaders and rc’s in one program, but i can’t see MS allowing an openGL implementation of one of the best things going for DX8.

afterthought: although, didn’t NV co-design the DX8 vs and ps languages with MS? still, i don’t think they could do it without MS licensing…

MZ1 · September 24, 2002, 5:10am

vshader,
If what you said were true, than MS would “disallow” both ATI_text_fragment_shader and ATI_fragment_shader as well, since they are based on DX PS 1.4

vshader · September 24, 2002, 6:53am

they are based on the same technology, but they don’t use the same syntax…there’s no “SampleMap” in ps1.4

but really, i don’t know.

i’m just guessing that if MS can claim IP on ARBvp, then you’d have to think they could do so (and more) on an openGL extension that was just a copy of the DX8 ps1.3 spec.

edit: and btw, isn’t it more that ps1.4 is based on ATI technology? again i’m guessing, but ps1.0 - 1.3 were basically written around the GF3 spec, and i assumed that lobbying from ATI got ps1.4 in DX8.1 so the API wasn’t so biased towards nVIDIA.

i’d be really interested if anyone knows the full story behind that … how much was written to the hardware, rather than hardware being made to the spec?

[This message has been edited by vshader (edited 09-24-2002).]

Dan82181 · September 24, 2002, 8:35am

ATI’s ATI_fragment_shader adds some capabilities to "fragment program"s that aren’t possible in DX8.1/PS1.4.

While ATI_fragment_shader is missing the capabilities of depth buffer output, there could be a good reason for it. When looking at the OpenGL ‘machine’, and taking what I known about optimizing fragment generation, I think this is where you run into a problem. I think ATI’s “HyperZ” may run afoul with a “depth fragment program” on OpenGL, I think that the color (RGBA) portion of fragments aren’t generated until after they pass the scissor, stencil, and depth tests. Just my speculation, maybe Evan or Jason or some other ATI guys/gals could shed more light.

But the great thing about ATI_fragment_shader is the source register modifiers, which aren’t available with DX8.1/PS1.4. The 2X_BIT_ATI, BIAS_BIT_ATI, COMP_BIT_ATI, and NEGATE_BIT_ATI are really nice features to have. Like expanding a range compressed vector ([0,1] -> [-1,1]). Normally (DX8.1/PS1.4), this would require an entire shader op just to blow it up. In OpenGL (ATI_fragment_shader), just use GL_2X_BIT_ATI|GL_BIAS_BIT_ATI as the source modifier, and you have your expanded vector (you could even add the GL_NEGATE_BIT_ATI if you needed the oposite direction too). And you can even do that for all 3 registers in a 3 register op , and still have destination modifiers

While it would be nice to have the depth capabilities, I think I’d rather have the source register modifiers, seeing as they add much more value (to me atleast) to fragment generation.

Dan

imported_fresh · September 24, 2002, 5:14pm

Wouldn’t having a depth fragment output totally ruin the early z test / hyperz optimizations? Maybe the shader compiler can determine whether or not the program is writing to the depth component and optionally enable/disable the z test optimizations.

sqrt_1 · September 24, 2002, 8:05pm

Originally posted by Dan82181:
[b]
But the great thing about ATI_fragment_shader is the source register modifiers, which aren’t available with DX8.1/PS1.4. The 2X_BIT_ATI, BIAS_BIT_ATI, COMP_BIT_ATI, and NEGATE_BIT_ATI are really nice features to have. Like expanding a range compressed vector ([0,1] -> [-1,1]). Normally (DX8.1/PS1.4), this would require an entire shader op just to blow it up. In OpenGL (ATI_fragment_shader), just use GL_2X_BIT_ATI|GL_BIAS_BIT_ATI as the source modifier, and you have your expanded vector (you could even add the GL_NEGATE_BIT_ATI if you needed the oposite direction too). And you can even do that for all 3 registers in a 3 register op , and still have destination modifiers

While it would be nice to have the depth capabilities, I think I’d rather have the source register modifiers, seeing as they add much more value (to me atleast) to fragment generation.

Dan[/b]

Uh… I am almost 100% certain D3D PS1.4 includes all the source and destination modifiers that you mentioned.

davepermen · September 25, 2002, 7:33am

Originally posted by fresh:
[b]Wouldn’t having a depth fragment output totally ruin the early z test / hyperz optimizations? Maybe the shader compiler can determine whether or not the program is writing to the depth component and optionally enable/disable the z test optimizations.

[/b]

you answered yourself…

Dan82181 · September 25, 2002, 11:59am

Originally posted by sqrt[-1]:
Uh… I am almost 100% certain D3D PS1.4 includes all the source and destination modifiers that you mentioned.

I know DX has the destination modifier, but the only source modifier I’ve ever seen has been then negate modifier, I’ve never seen the bias, comp, or 2x modifiers though, so I assumed they don’t exist. In several DX-PS examples I’ve seen in the past, I’ve always seen people use a MAD op to expand range compressed vectors (hence my speculation for their non-existance). That’s not to say they don’t exist, I’ve just never seen them in all examples I’ve looked through in the past, so I could very well be wrong. Anyone here know for sure?!

Dan

imported_Asgard · September 25, 2002, 12:08pm

Originally posted by Dan82181:
That’s not to say they don’t exist, I’ve just never seen them in all examples I’ve looked through in the past, so I could very well be wrong. Anyone here know for sure?!

They exist in almost all pixel shader versions. _x2 only in ps version 1.4.

For reference see http://msdn.microsoft.com/library/defaul…s/Modifiers.asp

davepermen · September 25, 2002, 12:08pm

http://msdn.microsoft.com/library/defaul…erModifiers.asp

zeckensack · September 25, 2002, 1:34pm

Originally posted by fresh:
[b]Wouldn’t having a depth fragment output totally ruin the early z test / hyperz optimizations? Maybe the shader compiler can determine whether or not the program is writing to the depth component and optionally enable/disable the z test optimizations.

[/b]
I don’t think the R200 has any of these early discard features. I may be wrong though.

The Geforce4Tis do have them, and they too can write depth. I guess it’s just as you said, that function is disabled if the shader modifies depth.

Dan82181 · September 25, 2002, 2:28pm

Originally posted by Asgard:
They exist in almost all pixel shader versions. _x2 only in ps version 1.4.

Well, I guess since PS1.4 came with DX8.1, those examples I saw must have been DX8.0 PSers, which would explain why I never saw a 2x modifier (nor had I seen the comp before) and why extra ops had to be used to expand normals. Thanks!

Edit:

I also noticed limitations in the DX spec about the combination of modifiers, notabily with the invert and with regard to contants. Those alone are two things I’ve had to do several times. Definately makes my glad I don’t use DX

Originally posted by zeckensack:
I don’t think the R200 has any of these early discard features. I may be wrong though.
The Geforce4Tis do have them, and they too can write depth. I guess it’s just as you said, that function is disabled if the shader modifies depth.

Pulling the data stright off of ATI’s website
http://www.ati.com/products/pc/radeon8500le/faq.html

[b]

Q25: What is HYPER Z™ II?
A25: Z-buffer data is a primary consumer of graphics memory bandwidth, which is the performance bottleneck of most graphics applications. Hence, any reduction in Z-buffer memory bandwidth consumption will result in performance dividends. HYPER Z™ II is a technology that makes Z-buffer bandwidth consumption more efficient by implementing the following memory architecture features:

Fast Z clear

Z compression

Hierarchical Z
HYPER Z™ II is second-generation technology, while other competing technologies have only been introduced for the first time. This results in a more robust and efficient implementation.

Q26: Other graphics manufacturers are claiming new memory bandwidth saving techniques. How does this compare to HYPER Z™ II?

A26: Like HYPER Z™ II, other graphics manufacturer optimizes memory bandwidth. Both HYPER Z™ II and competitor’s solution offer lossless Z-buffer compression. Both technologies attempt to discard polygons that are occluded by other polygons (a process called “occlusion culling”). In this respect, HYPER Z™ II is far superior. HYPER Z™ II saves the GPU from rendering over 14 billion pixels per second, while, it is estimated competitor’s only discards 3.2 billion. Fast Z clear has no counterpart in competitor’s architecture.
[/b]

I’m guessing that some sort of “early Z out” is present.

Now, I don’t claim to be an expert, but considering when the 8500 came out (more specifically, the timeframe in which the chip as being developed), it would seem like a rather smart move by a graphics chip company to not go though the trouble of caring about the value of the pixel being written to the color buffer before it gets sent to the scissor, stencil, or depth testing units (note about alpha test below). You would just be eating away at any and all performance you have. What seems to confuse me is this diagram…

http://www.ati.com/developer/sdk/RadeonSDK/Html/Info/RadeonPixelPipeline.html
(keep in mind that it was for the original Radeon/Radeon 7500)

You would think that the color value of the pixel is determined before the scissor test starts. I’m thinking that when a fragment is generated by a “fragment program”, that the alpha test either gets skipped, or moved to after the depth test and before the alpha blend (if thats possible). You certainly wouldn’t want to go though the trouble of running pixels though a large, complex “fragment program” to only have it killed by the scissor, stencil, or depth test. There may be a reason why it is possible for the “depth fragment program” under DX but not under OpenGL. Since I don’t work for ATI, I couldn’t tell you for sure what’s going on in the chip or the drivers, or if it is an IP thing with MS.

Dan

[This message has been edited by Dan82181 (edited 09-25-2002).]

vshader · September 25, 2002, 5:33pm

you are talking about deferred shading - not doing shading calcs until framebuffer contents are determined.

only the Kryo board does this AFAIK. the ATI board has the early Z optimization, but i’m sure fragments are shaded before they get alpha/stencil/scissor wetc tested - that’s why the pipeline diagram looks like that.

think about it - if it didn’t, it would have to store, for each framebuffer fragment, enough state info so it could go back and apply the fragment shader or texture environment or whatever when you call glSwapBuffers() - that’s when the framebuffer contents have finally been determined. that’s a lot of extra data per fragment …

marketing material for the Kryo board has a bit of info on deferred shading. it gets round the extra memory prob by only doing small tiles of the framebuffer at a time … i think. i’m a bit hazy on the whole thing.

[This message has been edited by vshader (edited 09-25-2002).]

Korval · September 25, 2002, 6:22pm

i’m sure fragments are shaded before they get alpha/stencil/scissor wetc tested

Why? Only the Alpha test is guarenteed to have anything to do with the output of the per-fragment operations. It is easy enough to move depth, stencil, and scissor tests to the beginning of the fragment pipe. That way, if the test fail, you don’t try to fetch a texture (or 4) or run a complicated fragment program.

The only time you have to (or even should) run any of these tests after the fragment stages is if those programs are going to change the results of the test. As long as the program doesn’t write to the depth (or alter the depth value), then there’s no need to put the depth test after fragment processing.

if it didn’t, it would have to store, for each framebuffer fragment, enough state info so it could go back and apply the fragment shader or texture environment or whatever when you call glSwapBuffers()

Um, no. Observe:

OK, you’re scanconverting a triangle. You get to a pixel. Now, you have fragment information. The thing is, you also have all the info you need to do depth, stencil, and scissor tests. You may as well do those now. Once you’re done, if the pixel wasn’t culled, you go ahead and apply the fragment information to compute the color. Then, based on the alpha test and blend mode, you apply this color. Then, you go on to the next pixel. There’s no need to retain the fragment information until swap buffers is called.

vshader · September 25, 2002, 7:54pm

yes Korval, you’re very right and what i said was kinda dumb… you gain less by doing the tests as the fragments come rather than waiting till the buffer is finalised (compared to the deferred shading algorithm)… but don’t you think if the cards did it that way the marketing spiel would trumpet it like they do the Z optimizations? i dunno, i just think it’s strange for ATI to publish a pipeline diagram that makes the system look less efficient than it is… all their 9700 diagrams have Alpha and stencil tests after the frag programs.