NV30 Extensions

As a corollary to my above post –

If you have a list of float textures that represent mip maps of a base float texture, you can do trilinear filtering in the fragment shader.

It’ll be pretty nasty (and I’ll leave coding it as an exercise for the reader), but everything you need to select MIP level and blend is available (i.e., screen-space partial derivatives).

For floating point cube maps, all you need to do is save the cube map as a rectangular cross texture and then convert [x y z] into the correct coordinates on the cross.

davepermen,

gking is on the right track here. We don’t have 10 billion transistors to throw at this problem. This has absolutely nothing to do with being “lazy” or anything of the sort. In fact, if we had an unlimited transistor budget, our lives would probably be a lot easier.

Floating-point multipliers are big. Floating-point adders are BIG. And we’re talking about full S1E8M23 precision here.

When you think about float buffer mode, a good analogy is to imagine it as a step as big as the step from color index mode to RGBA mode. Some of the stuff that the previous mode did doesn’t make sense in the new mode.

It is entirely plausible that it might never make sense to support old-style blending in combination with float buffers. And it is virtually guaranteed that filtering of float textures, even if eventually supported, will lead to large slowdowns.

Condition code modifiers do not make instructions any slower. This can lead to nice speedups over the older “SGE/MUL/MAD” approach. (It also gets rid of the whole “0 * NaN = 0, 0 * Inf = 0” annoyance.)

  • Matt

Originally posted by mcraighead:
[b]davepermen,

gking is on the right track here. We don’t have 10 billion transistors to throw at this problem. This has absolutely nothing to do with being “lazy” or anything of the sort. In fact, if we had an unlimited transistor budget, our lives would probably be a lot easier.[/b]

i do understand the HARDWARE limits you can get. but you’re saiing it will be useless anyways. and this statement is just plain stupid.

Floating-point multipliers are big. Floating-point adders are BIG. And we’re talking about full S1E8M23 precision here.

well, it depends. sure, they are big, but, for example for bilinear filtering you don’t need the full precision. and you only need to filter in the 0…1 range, so the multiplication is very different. take half float for example, you could convert them to… 64bit integers? without any precicion loss, i think (this is from brain, i don’t have a calculator here if it would be enough…), so sample the 4 values, convert to 64bit integers, do the same bilinear you did for years, and you know its fast, and convert them back…
i KNOW its not as easy as fixed point, and i KNOW it would be slower than doing pointsample. but do you think using the cg function from above will be faster?! bilinear filtering is a common task…

about the cubemaps. so… coding them for yourself. why the heck did you implemented cubemaps then? put them out again… no need for them. we render onto a width*(6*height) texture, 6 times, with glScissor, then we bind it, and we sample manually. cubemaps are useless. you can drop them as well…

same for 3d textures, 1d textures. why the hell do we still have 2d textures. we could do it all with 1d textures…

really… there is no point in not supporting the stuff you support since long time. they are very handy to have them automatically, and now we have to do them by hand again… there you are lazy… it doesn’t mean more transistors… it just means less work for drivers for you…

When you think about float buffer mode, a good analogy is to imagine it as a step as big as the step from color index mode to RGBA mode. Some of the stuff that the previous mode did doesn’t make sense in the new mode.

It is entirely plausible that it might never make sense to support old-style blending in combination with float buffers.
hm… okay, old style blending is not really needed. not all at least… but simple modulation with frame buffer, or addition is quite useful… but then i remember we don’t have a real framebuffer anyways… sort of funny… how do we actually draw onto a floatingpointbuffer? we have 4 outputs…

oh, and its not that a big step as the 8 to 32bit. the math from the software side remains the same over most the stuff… one thing that changes is the clamping… so we can now have full dynamic range on_parts_of_the_rendering_pipeline. i thought you will support a full floatingpoint version of opengl… instead you provide some float rendertargets, and thats it. no real float textures, no real float screen mode actually…

And it is virtually guaranteed that filtering of float textures, even if eventually supported, will lead to large slowdowns.
well… filtering… isn’t this actually… ab + cd + ef + gh… with a,c,e,g as the filtering kernel, and b,d,f,h as the four samples? isn’t that just a DP4 instr? i don’t see the point… the filtering kernel you can generate about the same way you did before…

Condition code modifiers do not make instructions any slower. This can lead to nice speedups over the older “SGE/MUL/MAD” approach. (It also gets rid of the whole “0 * NaN = 0, 0 * Inf = 0” annoyance.)
thats fine… i know that it does speed up by removing some instructions, but say you get instead slower execution of the individual instruction would be… well… not that nice…

Originally posted by NakoruruIs it just me, or does it seem like with all these extensions I should call this nVidiaGL because it seems that I could write a program that uses almost no standard OpenGL by using nVidia’s extensions. The only standard thing left seems to be texture objects!

is it just me, but are for example the fragment programs quite difficult to code with, as each register can hold a) a float, b) two halffloats, or c) two fixed points, or how ever values, as well as the branching and such stuff. is it just me, or do we now need to use cg to get our code readable again?

and why is NV_vertex_program2 not based upon ARB_vertex_program? with the additional instructions? would be more clean imho…

well, thats about all i have to rant currently. i just want to state that the nv30 will be, sometimes in the future, quite a good hw, from what i can see for now… i just think we’re still quite far from perfect… more far than i actually thought before reading those specs…

Originally posted by mcraighead:
[b]
Floating-point multipliers are big. Floating-point adders are BIG. And we’re talking about full S1E8M23 precision here.

When you think about float buffer mode, a good analogy is to imagine it as a step as big as the step from color index mode to RGBA mode. Some of the stuff that the previous mode did doesn’t make sense in the new mode.
[/b]

Humm. Right, well I must agree with the sentiments expressed by others that the lack of any blending whatsoever in the FP buffers is a real disappointment. OK, so 1-x doesn’t make sense if x<0 or x>1, OK don’t allow multiplicative blending, but I can’t even do additive blending ???

Sigh. Skulks off to wait for NV40.

I am wondering when I can use displacement maps for my shadow volumes in Gizmo3D. When will we see displacement maps in NV arcs

Oh come on people, don’t be too hard on Matt. My deepest sympathy for everyone involved in trying to get floating point math in 3d.
When floating point in the fragment pipeline was first talked about my first thought was actually something along the line of “how are they going to be able to do that?” knowing the cost of implementing them in hardware. Somehow I believed in the magic of ATi and nVidia engineerers anyway. But as it seams now with all these kinds of restrictions it turns out stuff are yet again going to be massively painful to code for. Somehow I think we would have better waited another generation for full floating point fragment shading, and possibly only taken the step of adding full 16bit/channel fixed point 1d/2d/3d/cube textures, possibly even 32bit/channel fixed point. Then in the next generation and maybe a smaller manufacturing process they might be able to take the real step into floating point fragment math.

in half a year you can have nice cheap amd cpus wich you can plug up to 16 on a motherboard… hehe, that is a full floatingpoint gpu/vpu/spu and no restrictions…

can’t wait to trace my rays there…

till then i will get the ati, as its yet here and provides about the same ‘features’…

in about one year pixelshaders and vertexshaders will finally be real shaders, done in a very useful way, generalized and all… i think at this time floatingpoint math will be fully supported as well… at least, i hope so…

I have to admit that I was a little surprised when I heard about floating-point frame buffers being in the next generation cards. So, its kind of odd that I should be so disappointed that things are not completely and thoroughly floating point everywhere with every feature we are used to. I should have been more sceptical. I’m not sure whether to blame myself or the hype machine.

I do not think I should be comparing the R300 or NV30 to some perfect card I can only dream about. The new features are an outstanding improvement. No card can ever compete with the perfect one you imagine in your head (except that maybe my current dream card will be obsolete in 3 years ^_^).

Good Job nVIDIA!

If I really wanted to complain to nVIDIA, it would be about the fact that their NV_* extension specifications altogether probably make a larger document than the OpenGL 1.4 spec.

> And we’re talking about full S1E8M23
> precision here.

If I can’t have that, I’d be perfectly happy with 16-bit FP precision. That’d be enough for me for a long time. I’ve mostly been planning on staying in 16 bit per component anyway, as that doubles your available register space.

If I can’t have 16 bit floats, I’d like 16-bit signed fixed (say, 4.12 or even 2.14) in as many places as possible.

Btw: You absolutely need multiplicative and additive blending in any reasonable graphics set-up, so it makes sense to say “we can’t do mult and add, so you don’t get anything”. Whether you need 1-x, or 1-clamp(x,0,1), or something like that is less clear. After all, you COULD pre-multiply when you generate the initial data, instead, assuming you have enough input data to go around.

“Plastic transparency” where you typically use A,1-A blending currently (as opposed to regular transparency, which is just multiplicative) should be done as multiplicative + diffuse/specular anyway.

(ToolTech) I am wondering when I can use displacement maps for my shadow volumes in Gizmo3D. When will we see displacement maps in NV arcs

I can’t recall source, but i’ve read somewhere that nv30 will allow “render to vertex array”.
Now, with the NV_pixel_data_range thing, i imagine this way: you render depth map to texture, do glReadPixels to VAR memory, and then you will have your shadow volume grid (am I correct?) ready to render.

(davepermen) and why is NV_vertex_program2 not based upon ARB_vertex_program? with the additional instructions? would be more clean imho…

I fully agree. And the same applies to fragment_program. The fact the ARB_fragment_program doesn’t exist yet is not excuse. Program-object managment, parameter loading, etc., has been defined and should be reused, even if instuction syntax rules were much extended.
I like nv30 HW features (despite limits), but the whole new nv30 extension pack is a proof that OpenGL 2.0 is the only hope.

I would not do any readpixels. Just keep it as a texture and render the displacement map to generate the volume.

ok. I might have misunderstood you at first glance. Good idea !

MZ, do you mean that nVIDIA should reuse their own program loading API (you cannot really mean that because they do) or that they should use OpenGL 2.0’s?

The only painful limitation is the inability to blend floating point framebuffers. However, if all you are interested in is one or two components, you can use your shader to pack higher precision data into the frame buffer (2x12-bit or 1x24-bit).

In most cases, floating point textures will be used primarily as intermediate buffers in screen space (so filtering them doesn’t make too much sense). And, if you’re focused on image quality enough to use floating point textures, you probably don’t want to use just linear interpolation.

And I wasn’t thinking clearly last night – mipmapping/trilinear filtering a floating point texture doesn’t require multiple textures. In fact, it becomes much less ugly when all MIP levels are stored in one.

Not having floating-point blending isn’t that much of a concern. After all, by the next rev, we won’t have any blending at all: it’ll just be a per-fragment parameter that we can do with as we please.

BTW, what happens when I want to have 4 per-pixel bump-mapped lights, where the tangent-space transform is computed in the fragment shader? NV30 doesn’t really provide for that, since it can only pass 10 parameters (only 8 of which are full-precision). I’d need at least 13 parameters for 4 lights.

Of course, 8 is larger than the 4 I have now… but not that much larger than the 6 that the 8500 provided last year.

[This message has been edited by Korval (edited 08-30-2002).]

Nakoruru,
I meant they should “freeze” NV VP, and use ARB VP interface as basis for any new gl1.x, asm style VP or FP.

Korval –

4 eye space light positions passed as parameters into the fragment shader (updated every frame)

an interpolated 3x3 tangent->eye matrix (or eye->tangent)
1 interpolated eye space object position

an eye space eye position constant at (0, 0, 0)

So, with 4 interpolants (3, if you recompute B=NxT every fragment) you can have quite a few more than 4 lights, and the resulting quality will be better than interpolating H and L per-vertex.

Of course, the performance won’t be as good, but you should be able to do some load balancing between the vertex and fragment programs with the remaining 5 interpolants.

Hi!
Just couple of my thoughts about float textures…
I think the lack of filtering for float textures shouldn’t be considered the missed feature. Instead we should simply assume that nv_fragment_program bypasses not only the standard texture application, color sum, etc., but also the texture fetching mechanism. It’s just another step towards “replace fixed functionality with programability”. We can now code any filtering scheme we want. With fixed function there’s only nearest, linear and anisotropic filtering, with mipmaps (nearest or linear). What if I want cubic filtering? Or summed area table for minification? Or linear filtering on s and nearest on t coordinate? All this is possible with fragment program (I think it is . Standard filtering types are just a couple of functions in Cg. The same with 1d, 2d, 3d, cube textures. What if I want a cube map access with correct filtering on texture boundaries? Or a 4d texture?
Another story is the speed of dedicated filtering hardware vs. filtering “emulated” in fragment program… But it’s like dedicated lighting hardware being twice as fast as lighting calulated in vertex program in NV20. Nobody is crying that we cannot use it when we use vertex programs, especially since vertex programs get faster and faster.

But I think there’s NO EXCUSE for not supporting any blending

Coop

So if an app is suffering from banding due to blending many textures it will not benefit from the NV30s implementation of fp color?

Originally posted by Korval:
[b]Not having floating-point blending isn’t that much of a concern. After all, by the next rev, we won’t have any blending at all: it’ll just be a per-fragment parameter that we can do with as we please.

BTW, what happens when I want to have 4 per-pixel bump-mapped lights, where the tangent-space transform is computed in the fragment shader? NV30 doesn’t really provide for that, since it can only pass 10 parameters (only 8 of which are full-precision). I’d need at least 13 parameters for 4 lights.

Of course, 8 is larger than the 4 I have now… but not that much larger than the 6 that the 8500 provided last year.

[This message has been edited by Korval (edited 08-30-2002).][/b]

think about it…

you have full floatingpoint values
IN the fragment program.
that does mean as well
full floatingpoint constants…
why don’t you store the lights in the constants? and simply send over the tangent space…
you don’t need to send over the screenspace position as well btw, you will get it for free…
store tangentspace as a quaternion, and you only need to send 1 4d texture coordinate
no need for anything else…
vertexprograms are not needed anymore for any shading, only for animating/skinning/tweening, what ever… not even for precalculating some lighting-data…