Originally posted by mcraighead:
[b]davepermen,
gking is on the right track here. We don’t have 10 billion transistors to throw at this problem. This has absolutely nothing to do with being “lazy” or anything of the sort. In fact, if we had an unlimited transistor budget, our lives would probably be a lot easier.[/b]
i do understand the HARDWARE limits you can get. but you’re saiing it will be useless anyways. and this statement is just plain stupid.
Floating-point multipliers are big. Floating-point adders are BIG. And we’re talking about full S1E8M23 precision here.
well, it depends. sure, they are big, but, for example for bilinear filtering you don’t need the full precision. and you only need to filter in the 0…1 range, so the multiplication is very different. take half float for example, you could convert them to… 64bit integers? without any precicion loss, i think (this is from brain, i don’t have a calculator here if it would be enough…), so sample the 4 values, convert to 64bit integers, do the same bilinear you did for years, and you know its fast, and convert them back…
i KNOW its not as easy as fixed point, and i KNOW it would be slower than doing pointsample. but do you think using the cg function from above will be faster?! bilinear filtering is a common task…
about the cubemaps. so… coding them for yourself. why the heck did you implemented cubemaps then? put them out again… no need for them. we render onto a width*(6*height) texture, 6 times, with glScissor, then we bind it, and we sample manually. cubemaps are useless. you can drop them as well…
same for 3d textures, 1d textures. why the hell do we still have 2d textures. we could do it all with 1d textures…
really… there is no point in not supporting the stuff you support since long time. they are very handy to have them automatically, and now we have to do them by hand again… there you are lazy… it doesn’t mean more transistors… it just means less work for drivers for you…
When you think about float buffer mode, a good analogy is to imagine it as a step as big as the step from color index mode to RGBA mode. Some of the stuff that the previous mode did doesn’t make sense in the new mode.
It is entirely plausible that it might never make sense to support old-style blending in combination with float buffers.
hm… okay, old style blending is not really needed. not all at least… but simple modulation with frame buffer, or addition is quite useful… but then i remember we don’t have a real framebuffer anyways… sort of funny… how do we actually draw onto a floatingpointbuffer? we have 4 outputs…
oh, and its not that a big step as the 8 to 32bit. the math from the software side remains the same over most the stuff… one thing that changes is the clamping… so we can now have full dynamic range on_parts_of_the_rendering_pipeline. i thought you will support a full floatingpoint version of opengl… instead you provide some float rendertargets, and thats it. no real float textures, no real float screen mode actually…
And it is virtually guaranteed that filtering of float textures, even if eventually supported, will lead to large slowdowns.
well… filtering… isn’t this actually… ab + cd + ef + gh… with a,c,e,g as the filtering kernel, and b,d,f,h as the four samples? isn’t that just a DP4 instr? i don’t see the point… the filtering kernel you can generate about the same way you did before…
Condition code modifiers do not make instructions any slower. This can lead to nice speedups over the older “SGE/MUL/MAD” approach. (It also gets rid of the whole “0 * NaN = 0, 0 * Inf = 0” annoyance.)
thats fine… i know that it does speed up by removing some instructions, but say you get instead slower execution of the individual instruction would be… well… not that nice…
Originally posted by NakoruruIs it just me, or does it seem like with all these extensions I should call this nVidiaGL because it seems that I could write a program that uses almost no standard OpenGL by using nVidia’s extensions. The only standard thing left seems to be texture objects!
is it just me, but are for example the fragment programs quite difficult to code with, as each register can hold a) a float, b) two halffloats, or c) two fixed points, or how ever values, as well as the branching and such stuff. is it just me, or do we now need to use cg to get our code readable again?
and why is NV_vertex_program2 not based upon ARB_vertex_program? with the additional instructions? would be more clean imho…
well, thats about all i have to rant currently. i just want to state that the nv30 will be, sometimes in the future, quite a good hw, from what i can see for now… i just think we’re still quite far from perfect… more far than i actually thought before reading those specs…