new shading mode

Humus · August 21, 2000, 2:06pm

Well, that depends on how it’s implemented. You can have have one set of geometry for drawing and one set for collision detection … but that’ll of course take up more memory space.

MrMADdood · August 21, 2000, 3:53pm

humus is right, phong could be implemented under DirectX 8 using a custom shader (but then again, we dont care)

Under OpenGL, if the card supports dot3, phong should be almost the same thing.
but dot3 is awfully slow (looks great, tho)
and dot3 can be aproximated too, if you consider the light source to be at an infinite distance… anyway… i think a phong aproximation is better.

foobar · August 21, 2000, 6:18pm

Since when does adding a feature to openGL mean we have to add more silicon to our graphics cards? Phong shading doesn’t have to be supported in hardware: having it supported in openGL means that those who want to use it don’t have to write their own proprietary routine using some irritating extension like NV_sub_transistor_combiners! They can just trust that if their graphics card supports it they will have an efficient implementation.

PS RISC evangelists should note that openGL is not a RISC machine! If you want your openGL accelerator to be a RISC machine then that is a very different issue.

PPS cards with dot3 bump mapping are nearly there anyway. All we need is procedural bump maps (in the case of Phong shading the procedure would be interpolation of vertex normals). These would have many more benefits other than phong shading. Anyway the point being that the hardware is not as far from supporting Phong shading as openGL 1.3 is.

imported_john · August 21, 2000, 8:51pm

OpenGL is not a machine, let alone a RISC machine. It was an analogy; nothing more.
Oh, and another thing: OpenGL is an abstraction of the graphics hardware.

[This message has been edited by john (edited 08-21-2000).]

Bob · August 22, 2000, 12:01am

MrMADdood: >>Under OpenGL, if the card supports dot3, phong should be almost the same thing. but dot3 is awfully slow (looks great, tho) and dot3 can be aproximated too, if you consider the light source to be at an infinite distance

So, we can approxiate phong with dot3, and consider the light at infinity?

Oh boy, a light at infinity will flatshade a cube’s face. And this was exactly the thing you DIDN’T wanted… or?

imported_Marc · August 22, 2000, 4:42am

If you have phong shading you can throw away almost all lightmaps, so you don’t need so much multitexture-units (gives you some transistors). Another solution would be, that you don’t interpolate the normals of a triangle for every pixel but for only some points in the triangle and do Gouraud between them (hardware tessalator?). If you are able to control, how many points should be ‘phong’-interpolated, you can do a trade off between speed and accuracy.

[This message has been edited by Marc (edited 08-22-2000).]

imported_newt · August 23, 2000, 10:09am

It may interest you to know that the SGI V6 and V8 Vpro graphics sets on Octane2 implement SGIX_fragment_light. Sort of phong but not quite.

So someone’s doing something about it.

MrMADdood · August 24, 2000, 8:26pm

Originally posted by Bob:
So, we can approxiate phong with dot3, and consider the light at infinity?

-thats silly… its not what i meant
i meant dot3 looks good that way…alone

Originally posted by Bob:
Oh boy, a light at infinity will flatshade a cube’s face. And this was exactly the thing you DIDN’T wanted… or?

depends… if you are using phong to show the light fadeoff in a fragment, yes, it would look flatshaded. if you use it to make it look like the fragment is bent, then the answer is no. the fact that the light is considered to be at infinity doesnt take away its direction. (example: sunlight on a sphere)

foobar · August 30, 2000, 10:07am

Originally posted by john:
[b]OpenGL is not a machine, let alone a RISC machine. It was an analogy; nothing more.
Oh, and another thing: OpenGL is an abstraction of the graphics hardware.

[/b]

Opengl IS a STATE machine. That state machine has instructions which change the state and have to be considered as CISC instructions. If you thought your analogy was irrelevant why did you use it?

And yes openGL is an abstraction of graphics hardware and phong shading ‘should’ be inplemented in hardware but for many years we didnt have hardware transformation on a lot of cards: we can’t accelerate the whole API right here right now but if we ever want to have phong shading in hardware you have to include it in the API. Is phong shading really such a level feature? It certainly can’t be done within the framework we have now so the only conclusion is that it is a missing. If you don’t want phong shading in the API there really is no argument for adding any other features to openGL at all since there aren’t many that are lower level than this.

imported_john · August 30, 2000, 4:01pm

OpenGL is a state machine, yes, but not a cpu. So drawing parallels between instruction sets is meaningless, becayse OpenGL does not decode instructions. CISC and RISC refer to the ISA. OpenGL does not HAVE an ISA, so how can you say whether it is a CISC (ISA) or a RISC (ISA)?? You can’t. A FSM != CPU.

My analogy was in the same vein as RISC v CISC. More complicated chips are difficult to make faster.

My argument wasn’t about NOT adding Phong shading. My argument was: it can be synthesised anyway by using a higher reoslution mesh, therby making the linear interpolation haev a smaller error term along a scan line. Why add THIS particular feature, over all others, just because people can’t be bothered to refine a mesh? There are OTHER shader models out there, like Torrens-Sparrow, for example. Phong isn’t magical. Why leap onto the Phong bandwagon—which you can emulate, anyway, with existing hardware—when there are alternatives? That is my argument. Not that phong is bad and shouldn’t be implemented, but just to keep it in perspective.

Another thing: drawing analogies with gourad shading and phong shading on cpu’s doesn’t necessarily correlate to comparing the same analogy on a graphics chip (which that SIGGRAPH paper was about). Graphics chips have less resources & in different places than a CPU.

Don’t get me wrong. Phong is good. But just make sure it’s implemented for the right reasons.

cheers,
John

[This message has been edited by john (edited 08-30-2000).]

Humus · August 31, 2000, 3:26pm

[QOUTE]
Why leap onto the Phong bandwagon—which you can emulate, anyway, with existing hardware—when there are alternatives?
[/QOUTE]

Why not implement it when current hardware is almost doing it in the case of DOT3?

foobar · August 31, 2000, 6:59pm

Suppose we have so many triangles in a scene that they are all 1-pixel in size. Then we are processing three normals for each pixel - if we had phong shading we would use less triangles and get the same effect of a normal per pixel.

Phong shading fundamentally adds per-pixel normals. It is a method for interpolating shading not an illumination model. The Phong illumination model is a different thing. All more complex illumination models could be implemented with gourard shading if you really wanted but the illumination model can only be calculated where the normals are! The question is where do we stick our normals?

an FSM != cpu but a cpu == FSM

[This message has been edited by foobar (edited 08-31-2000).]

Bob · September 1, 2000, 12:18am

If all polygons are one pixel in size, we will get the same effect as phongshading, even if we are using facenormals and flatshading.

inet · September 1, 2000, 1:57am

The huge mount of pixel triangles will
kill you. It’s not an efficient way.

I think the right method is do the light
computation at the fragment stage. Just
like the pixel shader in DX8. Though now it can’t do a complex illumination computation,
but the future version will be able to do. This programmmability will give us a great flexibility.

Ludo · September 6, 2000, 12:34am

It seems that everybody want more CONSTANTs for the glShadeModel() function… After, if it’s implemented in h/w or not, that is the vendor’s choice

MrMADdood · September 24, 2000, 2:17am

oh this brought a tear to my eyes…

there is light at the end of the tunnel… and it takes 6 texturing passes

put some ice on your texturing units and go check out: http://www.nvidia.com/Marketing/Developer/DevRel.nsf/pages/A29998E29896BE8E8825695C 0004D163

be sure to press F6 to check out the number of triangles in the scene.
seems dot-3 bumpmapping interpolates the light normal just like phong shading…

[This message has been edited by MrMADdood (edited 09-24-2000).]

LordKronos · September 25, 2000, 9:52am

I guess I should reply, seeing that the above link points to the per pixel lighting demo I wrote:

My take on this is to NOT add phong shading. Instead we need more texture units and more flexible per-pixel computations. My demo does very realistic lighting in 6 passes. This lighting includes diffuse and specular lighting/bumpmapping, gloss mapping, and distance attenuation.

By upping the texture units to 4, this lighting model (plus 1 or 2 additional effects tossed in) could be done in 2 passes.
The benefit of implementing several texture units and flexible per-pixel computations is that the hardware that enables me to create this lighting model can be used to create something completely different for someone else (like maybe quickly generating dynamic textures). If this lighing model were implemented directly in hardware, thats about all you could use it for.

NVIDIA is on the right track with their design. Per pixel lighting is definitely the way to go. Given enough general hardware, you can implement just about any complex lighting model your heart desires while leaving the door wide open for using the hardware in other creative ways.

MrMaDdood said:
seems dot-3 bumpmapping interpolates the light normal just like phong shading

Well, the vector interpolation isnt done through any special dot-3 feature. Its done using a generic feature: a texture unit with a cubemap texture.

The other key to this is the flexible register combiner sceme that allows you do perform arbitrary calculations per vertex. Some of the stuff I did probably couldnt be done purely with dot-3 hardware. But by making that hardware that does the dot-3 calculations flexible (which is kinda what the register combiners do, and then some) almost the same silicon can have so many more possibilities.

[This message has been edited by LordKronos (edited 09-25-2000).]

MrMADdood · September 25, 2000, 7:32pm

MrMADdood bows to LordKronos

“yes sire, thou art thee man”

foobar · September 26, 2000, 9:43am

I am guessing that adding more texturing units is more expensive than adding phong shading - yes they are more flexible but as soon as you decide to do complex per-pixel lighting you loose all your texture units!!

If phong shading was added and then we had something like register-combiners on the lighting pipeline you could save all your texturing units for other effects. And since video-memory bandwidth is currently the performance limiting factor adding more texturing units may not be the answer: eventually you have to stop reading/writing to/from memory and instead do calculations on the GPU to go faster: phong shading is the perfect place to do this and will lead to dramatic increases in image quality without wasting texturing bandwith.

LordKronos · September 26, 2000, 2:34pm

OK, this is a bit long, so bear with me…

Originally posted by foobar:
I am guessing that adding more texturing units is more expensive than adding phong shading - yes they are more flexible but as soon as you decide to do complex per-pixel lighting you loose all your texture units!!

The problem is, what will phong shading get you? Smooth interpolation of the surface normal. If you want realistic rendering, you are usually going to want textured surfaces. Bump mapping is going to make surfaces look a lot better then simple phong lighting. In order to do bump mapping, you need to use a texture. From here, its only a small leap to doing the whole equation using texture units. Now, if the hardware SPECIFICALLY does phong shading, how are you going to integrate that into your bump mapping? The answer is that you cant (not if you want something realistic looking) because the result of the phong lighting doesnt take into account the local (pixel-level) surface irregularities. So if you want to do bump mapping, you are going to ignore the result of the phong shading unit, which equates to wasted silicon.

Another thing to consider is that phong shading would be quite expensive, requiring 2 square roots per pixel. Not sure if you know or not, but a square root is quite expensive. Doing it via texture units requires no square roots, making it more viable for inexpensive and high performance graphics cards. If it were feasible to implement a per-pixel square root, I would rather have that exposed bare through the register combiners. I can tell you from my work, I’ve never needed a phong lighting unit but I could have used a square root unit once or twice (and for things that have nothing to do with phong anything). That was my reason for calling for more texture units and more flexible per-pixel calculations. If that hardware was built to do phong lighting, thats all you could do (and would you even do that, or would you rather bump map), but if the same functionality was provided raw, you could use it in so many ways.

And since video-memory bandwidth is currently the performance limiting factor adding more texturing units may not be the answer: eventually you have to stop reading/writing to/from memory and instead do calculations on the GPU to go faster: phong shading is the perfect place to do this and will lead to dramatic increases in image quality without wasting texturing bandwith.

But with more texture units, you can collapse thing into far fewer passes. Right now, cards are severely bandwidth limited. What we need to do is minimize that bandwidth. There are many ways to do so, and more texture units is one of those ways. (Yes, I know the intuitive thing is to think the opposite, but…)

In my demo, I required 6 passes (5 of which were dual textured) to do my lighting. That means it took 11 texture accesses, 6 z buffer reads, 1 depth buffer write, 6 color buffer reads and 6 color buffer writes. With 32 bit color and depth buffers, thats 194 bytes + 11 texture accesses per pixel. Using the simplest texturing scheme (nearest texel, no mip mapping) that would be 114 bytes for the textures. A total of 76 + 44 = 120 bytes of bandwidth per pixel.

Now to contrast, lets assume we have a card with 4 texture units. The same lighting would be able to be done in 1 quad textured pass + 1 single textured pass. Thats 5 texture accesses, 2 depth reads, 1 depth write, 2 color buffer reads, and 2 color buffer writes. Thats 7*4 bytes + 5 texture accesses per pixel. Again, assuming nearest texel, no mip-mapped texturing, thats 28 + 20 = 48 bytes of bandwidth per pixel. Only 40% as much bandwidth. How can that be?

More texture units requrire LESS bandwidth? The reason, as described above, is that there is a LOT of overhead in each pass.
Also, comparing less textures/more passes to more textures/fewer passes I can say this. With more texture units, in the BEST case, the texture bandwidth is the same as with fewer units. If it take 5 textures either way, you have the same 5 texture accesses spread out over a different number of passes. However, that is in the BEST case. In my example, I needed 11 textures in dual texture hardware vs. 5 textures in quad texture hardware. The reason for this is that, in the process of performing diffuse and specular lighting, I had to caculate attenuations in the first pass, multiply that by the diffuse blinn lighting calculation in the second pass, multiply that by the texture color and the spotlighy filter in the third pass. Then in the fourth pass, I had to calculate attenuation AGAIN. In the fifth pass, I had to calculate the specular blinn lighting equation (which required me to access the bump map AGAIN). Then in the sixth pass, I had to multiply by the diffuse texture AGAIN and the point light filter AGAIN. In all, I had to access the some of the textures in multiple passes, causing redundant bandwidth usage.

This was a bit long, but hopefully it helps you to see the notion that while more texture units looks like it requires more bandwidth, in practical implementation it requires less. I also hope you can see that dedicated phong hardware would be a waste when that same hardware could be added to general purpose per pixel calculations.