Official nVidia ForceWare 52.16 and nothing about GLSL

Originally posted by Korval:
Not really. You have to budget your silicon. Do you want to simply not give them float buffers at all, or are you willing to spend a small quantity of the silicon on a specialized case of float textures (which covers >50% of the performance-applicable uses of float textures. Both shadow mapping and float render-targets)?

A-ha, nice to see someone second that rendering to pbuffers with pow-2 texture is slower than to framebuffers and rectangles…

its just another sampling unit, that accesses at a 4x bigger scaled pointer. sampling the amount of data is no problem (bilinear filtering samples as much). sampling from another place is no problem (hw is able to scale to sample from 8bit, 16bit, 32bit textures). sampling point sampled float values is no problem. texture_rectangles proove this.

Like you said, you don’t understand hardware. I’m a programmer, and I have a better understanding of hardware than this. You’re a programmer (right?), so you should know that nothing you just said absolutely needs to be true.

If this were software, you could easily imagine an implementation where none of the above is the case. I’m sure you’ve seen code written where the equivalent of adding “another sampling unit” would be a pain to implement because of the design of the preexisting code. I’m sure you’ve seen code where the equivalent of “sampling the amount of data” is a very difficult propsition, for a number of reasons.

In short, as any real programmer knows, saying something is often a lot easier than getting it done.

Sure, you could rewrite the base code, but that takes a long time. Longer than nVidia might have wanted their development cycle to take.

they wasted their silicon for the wrong things. thats all.

Your partisan attitude is getting a little tiresome. Just because ATi made the features that you wanted doesn’t make their design choices any more correct than nVidia’s.

and making normals with only 8bit per channel is not that good

Actually, you picked a vertex component where making them 8-bit wouldn’t be so bad. 8-bit normals are pretty decent for describing a direction. Not terribly great, but certainly liveable. Positions and texture coordinates, by contrast, are far less likely to work in 8-bit resolution.

yeah, superbuffers by default would allow to create all sort of float-buffers.

Not likely. There is no ARB-based floating-point texture support. As such, they would have to build a second extension that allowed for floating-point textures.

Also, there’s no point in preventing older hardware that is incapable of float texture support (pre-R300/NV30) from using ARB_superbuffers. As such, the ARB_superbuffers extensions ought not require the existence of this float-texture extension.

this could be the reason why superbuffers aren’t yet there, as normally nvidia is always fast in adding new extensions.

That, or that the spec isn’t complete. Considering that there has been little mention of the spec in recent months, that seems far more likely.

Besides, nVidia doesn’t have enough friends on the ARB to tie up an important extension like this for several months.

and yes, i think arb_pixel_buffer_object will be the subpart that allows at least partial have the fun of superbuffers.

And which part is the “fun” part? The ability to used rendered data as vertex data? How, precisely, is that “fun”? And who decided it was “fun”?

I, for one, much prefer the thought of virtually free render-to-texture to merely using a texture as source data for vertices.

your claim to have a bether understanding of hw than i have is taken as a personal offense and like that dropped.

ati showed its possible to design hw in a generic, and logical way. nvida failed to show so with the nv30, and now have to software patch that with ultracomplex drivers. they start to succeed with that, wich is a good thing.

why this could be a fun part? because this for the first time would allow to write very complex algorithms on data that does not only mean vertex-in-pixel-out. instead, we can loop around, process indirect data-accesses, and much complex things can be done. it will allow a much higher amount of freedom as today, where all we can do is reconfigure some parts of the fixed pipeline structure. superbuffers will start to unlock that forced pipeline structure.

we are forced to this structure by the driver, not by the hw. superbuffers will free us from this force. this is a good thing.

and isn’t it the fun part when we can just play around with new, never thought about ideas and ways? if not, what is the fun then? coding yet another dot3 perpixellighting and stencil shadow demo like everyone else?. yes, gfFX is great for that, as you can use fixed point and ultrashadow there. doesn’t make the hw advanced or future proof in any way.

if you want to create hw that gets useful in a lot of tasks, and future proof, you HAVE to design it in a generic way. implement features independent of other features, so that they can get used in not-yet-thought of ways. float textures are one such thing. nvidia failed there.

more specifically:

Your partisan attitude is getting a little tiresome. Just because ATi made the features that you wanted doesn’t make their design choices any more correct than nVidia’s.

show me one nvidia feature that is really useable for games of tomorrow, that is not just “a bit bether” than what we had before anyways.
fast fixed point units with 12bit? we had 10bit, 9bit before, and doom3 uses all of them, depending on hw. and it doesn’t look bether or worse really on any.
huge shaders? first they should be able to run them fast enough. i think every gamer would have prefered smaller, but usable shaders, that run fast. not that i would not like bigger shaders.
float support? as dave kirk stated: they thought 16bit will be enough (but his assertion was wrong, and based on wrong “facts” about cinematic quality). they as well somehow thought they only need float textures for rect-rendertargets. again, wrong.
anything else?

gfFX has less features than an r300 or newer card. fixed point is not an additional feature, its a step between older and newer features. a step between that would be useless ,if the new features would be fast enough.

I think the GFfx shaders are more than fast enough for most purposes. They don’t need to be the “fastest” to be useful. It’s your job as a programmer to make the best use of the hardware, or as you seem to prefer dave, ignore the audience that uses that hardware. A graphics programmer stuck in “benchmark” mode is traveling a never ending road.

i don’t ignore the audience that uses that hardware. i ignore hw features that don’t have a big audience. that means i don’t use floatingpoint textures much currently, as well as other proprietary extensions.

i don’t care about best performance eighter. but i do care about hw that doesn’t work well in defaultmode, while working much bether ONLY in proprietary modes that don’t have a future. gfFX looked like this since its very existing, and while looking bether now, it still looks like the hw had some very bad design choises, way off any real logic.

all we want is a usable opengl on every platform, so that we can easily develop fast working code, not?

Dave needs a treatment, not an argument.

He has undergone a year long, self inducted brain washing process, and his mind has fallen into infinite loop. Please ignore his blabling (arguing will only cause escalation of the symptoms), and maybe a self healing process in the brain tissue will start eventually.

bah… i had brainwashing from nvidia long enough. wash yourself.

Originally posted by DFrey:
It’s your job as a programmer to make the best use of the hardware <…>
Yes, it is. Just as much as it’s the IHV’s job to provide hardware.

Who’s going to write NV_fragment_*** paths anyway? Devs who own NV3x cards, perhaps. I won’t. Sure, I include a ARB_precision_hint fastest in each and every fragment program I eventually hand to the driver, but I can’t be arsed to cater for a three datatype design choice that was clearly unnecessary.

It’s not a my job to cover up for weak design choices. NV_fragment extensions don’t offer anything over ARB_fragment_program I can appreciate, and sure are temporary in nature. I hope that, eventually, NVIDIA hardware will just drop the multi-type thing.

Originally posted by zeckensack:
Who’s going to write NV_fragment_*** paths anyway? Devs who own NV3x cards, perhaps. I won’t. Sure, I include a ARB_precision_hint fastest in each and every fragment program I eventually hand to the driver, but I can’t be arsed to cater for a three datatype design choice that was clearly unnecessary.

Me. You know why I bought an FX and not a Radeon? Because I WANT writing that NV30_ s*
Propiebtary API is a must - we know it from an evolution theory. There is not one leading 3D company but two of them - and by the way, I find NV_fragment_program much more powerfull then ARB version. ATI has some very amusing features that I lack(like MRT) but Radeon is an mainstream card, when FX is providing a more freak-like hardware. And I find fixedpoint solution simply perfect. Nvidia created a new generation, universal processor hardware, when ATI is still going with register combiners stuff.That’s why Radeon is faster. New hardware approaches must be tested for usebility first: you can’t make it perfect the first time. Nvidia did a great, not too great(they could take more tiem), but still a great job. Myself pretending to be a freak, I like FX more then ATI.

[This message has been edited by Zengar (edited 10-28-2003).]

Originally posted by Zengar:
Me. You know why I bought an FX and not a Radeon? Because I WANT writing that NV30_ s*
Propiebtary API is a must - we know it from an evolution theory.

but proprietary API as the ONLY REAL USABLE WAY is NOT a MUST. and that was true for the fx till latest drivers.


There is not one leading 3D company but two of them - and by the way, I find NV_fragment_program much more powerfull then ARB version.

yes. but much more cumsy, too… or do they have ALIAS and TEMP in now? nvidia is not really good at designing exts imho. much too lowlevel, forcing YOU to do repetitive tasks a good api should care about. opengl does, nvext doesn’t

ATI has some very amusing features that I lack(like MRT) but Radeon is an mainstream card, when FX is providing a more freak-like hardware.

and what do you target for? gamedev where money is, mainstream. always. thats ALWAYS first. who cares about freaks? they can care about themselves anyways.
and calling the features “amusing” is rather primitive. you obviously never used an ati… else you would feel so happy as a freak to run one…


And I find fixedpoint solution simply perfect.

for?

Nvidia created a new generation, universal processor hardware, when ATI is still going with register combiners stuff.

thats why radeons don’t have a fixed function at all in anymore. thats why they moved completely to floatingpoint, emulating all before.
and thats why the nv30 is still based on the nv20 core, with some extensions on the texture_shaders wich happen to be able to so some ARB_fp tasks. if you know how the hw work you know what its based on.


That’s why Radeon is faster.

no. its because they designed something that is based on KISS. works for hw, too. just drop all useless features if you can emulate it by a powerful, simple and thanks to that very fast core. their concept prooved to be true.

New hardware approaches must be tested for usebility first: you can’t make it perfect the first time.

nv30 is based on the nv20 wich is based on the nv10. the gf series is long, and wellknown to be good. they haven’t placed in much new hw approaches. but they definitely failed good pretesting.

Nvidia did a great, not too great(they could take more tiem), but still a great job.

who exactly? the marketing getting people to believe gfFX is the future, holding them on buying an ati over a year, not providing anything bether still a year after?
the driver developers exploring all sort of hacks and cheats to help the marketing to hide what really happened?
the hw developers who realised that their hw isn’t really made for the new dx9 features?
who made a great job?
or do you mean the fan designers?

Myself pretending to be a freak, I like FX more then ATI.

to be a freak is nothing to be proud of. i’m one, too. and being a freak normally means you want something. no mather what, you want it. and you will realise that wish. what i want is programable hw, wich can break up the ordinary pipeline to expose NEW features. pixelshading i’ve done on the gf2. THERE fixedpoint was cool. but that was years ago.

we’re out of that age, we really are. if you look for something new, look for what the others provide.

[This message has been edited by Zengar (edited 10-28-2003).]

[This message got responded by davepermen (respond time, early in the morning, just got at work… and still ****ing tierd… oh, the phone rings… ****, have to go…).]

yes. but much more cumsy, too…

I’ll take powerful over clumsy…

Unless it’s prohibitively clumsy. Having to do your own register allocation is hardly prohibitive.

thats why radeons don’t have a fixed function at all in anymore.

We’ve had this discussion before, and you lost then. The unusual limitations (texture dependencies?) of the R300 architecture, we know that it is little more than an R200 with floating point (that, and dual-issue of texture and ALU instructions, assuming the R200 didn’t have that) and 2 more passes. That means that the R300 is just as much fixed function as the R200. It just happens to get disguised by the ARB_fp extension.

And, it is still my belief that the R300’s fp architecture cannot survive; I believe that it must change and become something more like the FX’s if conditional branching is ever going to be economical in their architecture. I almost guarentee that the FX’s fp architecture will be able to make the switch to conditional branches easier than the R300. Just look at the odd looping and so forth that it already does, compared to the basic fixed-functionality of the R300.

Think of it like this. When I first read about the R200 hardware, I came to the conclusion that this same basic architecture could last through the next generation of hardware. That it was flexible enough to handle, what was then, advanced fragment programs with lots of texture accesses and so forth. The NV20 architecture, by contrast, was a hack on top of the RC architecture of the NV10. It clearly could not last in a more programmable environment.

Even before the NV_fp spec was published, back when the 9700 just came out, I knew that the NV30 would have more powerful per-fragment operations with fewer limitations than the 9700. Which is undeniably true.

Now look back. As far as performance was concerned, the R200 couldn’t hold a candle to the NV2x line. Now, the R300 is faster than the revamped NV3x line (though the gap has narrowed significantly). Because nVidia doesn’t have to do a massive rearchitecting of their per-fragment pipe (like the one ATi put off in the upgrade from R200 to R300), they can focus their efforts on performance. This is what ATi did in their upgrade from R200 to R300.

I almost guarentee that you will find that the Pixel-Shader 3.0 features will run faster on the NV40 than the R400 because of this. Sheer speculation, I know, but it does make sense.

thats why the nv30 is still based on the nv20 core, with some extensions on the texture_shaders wich happen to be able to so some ARB_fp tasks. if you know how the hw work you know what its based on.

Which explains why you don’t. As you said yourself, you don’t understand the hardware.

Clearly, the NV30 is not based on the NV20’s texture shaders. That much is obvious from this site’s (highly accurate) analysis of the NV30 architecture. The NV30’s fragment programs are something that is very different from the texture shaders of the NV20.

no. its because they designed something that is based on KISS.

Nowadays (ie, for NV35+ hardware), there are only 2 reasons why the Radeons are faster. Dual issue of texture and ALU ops, and the NV3x’s difficulties with temporary registers. Their actual floating-point computational speed is virtually identical, at least for most operations.

As such, you can no longer say that the NV35 isn’t designed for floating-point fragment operations.

who exactly? the marketing getting people to believe gfFX is the future, holding them on buying an ati over a year, not providing anything bether still a year after?

Once again, you conviently forget the fact that the two were in development concurrently. The FX slipped its release date significantly, which makes ATi’s lead look like much more than it is.

Besides, nVidia doesn’t have to offer something better. They need to merely match what ATi offers. And, for the most part, they have done that with their newest cards. Granted, the high-end FX isn’t 100% of its ATi counterpart (and FX’s tend to lose in antialiasing/anisotripic performance comparisons), but they are close. It is not unthinkable or idiotic to pick up an FX today.

the driver developers exploring all sort of hacks and cheats to help the marketing to hide what really happened?

Are you living in the 4-month-ago days where cheats would be discovered in nVidia drivers almost daily? Their image quality problems are gone (save for actual hardware differences). This statement may have been true once upon a time, but it isn’t now. And your constant harping about things that are no longer the case is really annoying.

The FX’s performance gains these days are due to better compilers in the newest drivers, and less due to cheating.

what i want is programable hw, wich can break up the ordinary pipeline to expose NEW features.

Good for you. I’m a performance hog. That is why I very much like the “ordinary pipeline”: it’s fast. I don’t like multipass, because it’s slower than single-pass. I don’t like float buffers, because they’re slow. I absolutely dispise ARB_shadow being slow. I don’t like the vast majority of features that adversely affect framerate.

What you want influences what hardware you’re going to use.

that’ll get too long…

Unless it’s prohibitively clumsy. Having to do your own register allocation is hardly prohibitive.

i’m just talking about the shading language. ARB_fp is much nicer to use than NV_fp. NV_fp logically is more powerful.

(the whole text till …) I almost guarentee that you will find that the Pixel-Shader 3.0 features will run faster on the NV40 than the R400 because of this. Sheer speculation, I know, but it does make sense.

i guess you’ve read on beyond3d, too, wich clearly shows nv30 IS based on nv20. fp are part of the original texture shaders, thats why nv30 has issues with sampling textures and doing floatingpoint ops at the same time (and a possible reason for not real good float sampling support for textures, dunno). they essencially built loops around the original texture shaders, and extended them. same is documented in your link. they DID extended it by much, but the BASE was there. and its about the only extension nvidia did, + looping in vp.

nv40 CANNOT be based on nv30. the fp units of the nv30 are NOT capable of doing any sort of branching, and never will be extendable to do it. it processes dependent quads. it has to process independent fragments instead.

but i know the r400 cannot be based on the r300 eighter.

[b]Which explains why you don’t. As you said yourself, you don’t understand the hardware.

Clearly, the NV30 is not based on the NV20’s texture shaders. That much is obvious from this site’s (highly accurate) analysis of the NV30 architecture. The NV30’s fragment programs are something that is very different from the texture shaders of the NV20.[/b]

not very different, no.

[b]Nowadays (ie, for NV35+ hardware), there are only 2 reasons why the Radeons are faster. Dual issue of texture and ALU ops, and the NV3x’s difficulties with temporary registers. Their actual floating-point computational speed is virtually identical, at least for most operations.

As such, you can no longer say that the NV35 isn’t designed for floating-point fragment operations.[/b]

even dave kirk sais so…
and IF it would’ve been made for fp, why would they plug in a twice as fast fixed point in, and waste tons of silicons for it? they could’ve reused this silicon for additional fp pipes. guess what would look much bether in dx9 and ARB_fp?

Besides, nVidia doesn’t have to offer something better. They need to merely match what ATi offers. And, for the most part, they have done that with their newest cards. Granted, the high-end FX isn’t 100% of its ATi counterpart (and FX’s tend to lose in antialiasing/anisotripic performance comparisons), but they are close. It is not unthinkable or idiotic to pick up an FX today.

as nvidia always hypes around moores law in half a year, it should provide now a gpu that is 4 times as fast as the radeon9700pro. technically.

and the cards (most) are still bigger, and more energy consuming than the radeons. another big reason why to not buy them. not all, but the high end at least.

The FX’s performance gains these days are due to better compilers in the newest drivers, and less due to cheating.

yes. but nvidia ****ed up any sort of trust to its customers with what they did. any other company would get sued for such behaviour. by billions…

Good for you. I’m a performance hog.
who isn’t?

That is why I very much like the “ordinary pipeline”: it’s fast. I don’t like multipass, because it’s slower than single-pass. I don’t like float buffers, because they’re slow. I absolutely dispise ARB_shadow being slow. I don’t like the vast majority of features that adversely affect framerate.

well… if you don’t like any graphics, get a gameboy. i’m out of the age of simplistic rastericed graphics looking like psx, or n64. drawing textured triangles isn’t something nice.

i don’t like slow things as well. thats why i like the ati solution. multipass works fast, floatbuffers work fast,ARB_shadow i don’t use, but i can do it on my own with fp fast.
nobody needs more fps than the screen can provide. if i have, i cummulate them for motionblur.

and guess what? i don’t have to work lowlevel and tweak my shaders and all the fuzz to get my stuff running fast. i just use opengl the way its ment to be, and it all performs well, no mather what i do.

why restrict yourself to the ordinary pipeline, not watching any more around than at a fov of about 2° ? open your mind, dude, and learn to use your hw. if its the right hw, it can handle it all fast. if its the wrong one, you buyed the wrong card.

i would never buy a card that forces me to limit my freedom to get something done.

i would never buy a card that performs bad at normal tasks.

a gpu should render, thats its normal task. if it can’t do it fast in all sort of situations, its a bad gpu.

Originally posted by davepermen:
that’ll get too long…

But that starts making fun now

i guess you’ve read on beyond3d, too, wich clearly shows nv30 IS based on nv20. fp are part of the original texture shaders, thats why nv30 has issues with sampling textures and doing floatingpoint ops at the same time (and a possible reason for not real good float sampling support for textures, dunno).

Dave, I am very sorry, but I have read almost all beyond3d topics and I don’t understand how do you come to your conclusion. NV30 is based on NV20 shaders if we consider memory manager, yes. But the engine itself is quite other.

Back to fixedpoint: solution between performance and quality. Fixedpoint will be always faster then floatingpoint and you don’t need floatingpoint everywhere. That’s shooting with a shotgun after flies. Also, you don’t need floating-point textures everywhere. For cubemaps you can use rgbe textures, that are not slover then floatingpoint textures, consume much less memory and lose almost no precision.

I got up early in the morning too… Guess I become too lazy lately… Looking forward to making a nice breakfest and cleaning my flat .

Greets

[This message has been edited by Zengar (edited 10-29-2003).]

But that starts making fun now

… Welcome To The Club!!! …

isn’t that WHY we’re doing it? in the end, the ones that sell best will win anyways, not the one with best made hw. it just happen to be the same sometimes

Dave, I am very sorry, but I have read almost all beyond3d topics and I don’t understand how do you come to your conclusion. NV30 is based on NV20 shaders if we consider memory manager, yes. But the engine itself is quite other.

its fun to know that we actually know both the same, and get to such different conclusions on how the hw works then. i have no clue how to solve that you don’t have any eighter…

but one thing is for sure: nv30 and r300 will not be capable of surviving into nv40 or r420 if they want ps3.0 to work in. BOTH are not capable of supporting ps3.0 efficiently. nv30 is possible more capable of doing it AT ALL. but not efficient.

Back to fixedpoint: solution between performance and quality. Fixedpoint will be always faster then floatingpoint and you don’t need floatingpoint everywhere. That’s shooting with a shotgun after flies. Also, you don’t need floating-point textures everywhere. For cubemaps you can use rgbe textures, that are not slover then floatingpoint textures, consume much less memory and lose almost no precision.

same everywhere. so do you use a fixed point vector class and do math with it in certain situations on pc? you can even run it in parallel with an sse task for example!! so you get fastest performance at all…
then again, on cpu’s its not worth the trouble… we’ve moved away to floats long ago, and about no one in gamedev uses fixedpoint anymore for most math. image processing, yes, but even there, due to hdr comming in trend more and more, fixed point merely gets a storage compression solution than a calculation solution…

yes, technically, fixed point will always be faster. but who cares if floats are FAST ENOUGH. they are, on cpu’s. and they are, on my radeon. and they give me much more flexibility, and the ability do handle my tasks on a higher level.

i don’t need floatingpoint textures everywhere. but i need SUPPORT for them everywhere. its just by default that i always want it there where some arbitary restriction forbits to use it. i had per pixel normalized diffuse and specular bumpmapping on a gf2, but there was one point where i was not able to _x2 in the end, and so the whole image was at half brightness… that made me SO angry!..

i know of the workarounds, i can work with fixed point, i DO work with fixed point even! (you know i’m coding software / hw mixed raytracers in my hobby… so i DO care about speed as i care about flexibility). and i DO use smaller texture formats if they fit my needs.

i’m not stupid

just as i never accepted that the gf3 has programable pixelshaders. i wasn’t able to do

attentuation = tex1d(dot3(point_to_light,point_to_light)).

you agree with me that it was a fixed point extension… you said this above

i’m interested on why do you think the nv30 is so more advanced than the radeon. in features, its not. why in hw? what does it have the r300 doesn’t that i’m not aware of?

it has a computation unit and a loop surrounding it. wow. it can loop more than ati… it has a much more complex computation unit than ati…

but i don’t get where the big difference is…

fp are part of the original texture shaders, thats why nv30 has issues with sampling textures and doing floatingpoint ops at the same time (and a possible reason for not real good float sampling support for textures, dunno).

The reason you think that texture shaders have anything to do with NV_fp’s is because the unit that does texture addressing is the same one that does computation? That’s like saying that a Pentium 4 is in any way similar to the Motorolla 68B09E (the chip that powered the TRS-80 Color Computer 3) because they both stall when they access memory.

In any case, it’s a tradeoff: No dual-issue of texture and ALU, but no dependency restriction either.

the fp units of the nv30 are NOT capable of doing any sort of branching, and never will be extendable to do it. it processes dependent quads. it has to process independent fragments instead.

To an extent, yes. To an extent, no.

While I agree that it does need to be able to have 4 separate execution units, you have to admit that this is much simpler than what ATi needs to do to get the same thing. All nVidia needs to do is build this same architecture into 4 pipelines, rather than 1 architecture that operates on 4 data chunks. That should be much easier than building some kind of arbitrary looping functionality into an R200-style multi-pass design.

as nvidia always hypes around moores law in half a year, it should provide now a gpu that is 4 times as fast as the radeon9700pro. technically.

Droning by marketting people means nothing in terms of a debate on hardware.

who isn’t?

Well, anyone who is willing to use floating-point buffers (this as opposed to floats in fragment programs) as either render targets or texture sources, as these eat up tremendous bandwidth.

i’m out of the age of simplistic rastericed graphics looking like psx, or n64. drawing textured triangles isn’t something nice.

The foundation of good graphics are well-textured triangles. These must happen first. Without this foundation, no quantity of float buffers, HDR effects, or bump mapping will ever help those graphics.

For games, hi-res textures will serve a game’s graphics far more than any HDR, because it adds detail. And it is these details that improve the plausibility of the world far more than the most amazing HDR effect. We can start talking about HDR and so forth when I can put the camera right next to a wall and not see any form of filtering artifacts; when the texture on that wall is so hi-res that it’s popping out of the screen.

You may not think that mere textures are important, but that may be because textures aren’t something that programmers do. As graphics programmers, we don’t create textures. Nowadays, we make interesting shaders. And, as interesting as they may be, they must still be backed up by good, high-quality art. I say that us graphics programmers should get out of the way of the artists, until their job is done. Then, we should use the resources that they have left us (or negotiate for more resources) in terms of creating interesting shaders.

In order to have graphics that looks like something more than a triangle with some effect applied to it, you need high-quality texturing. Otherwise, you’re just making a demo.

floatbuffers work fast

If you consider leaching 4x (or more, vs. S3TC) the bandwidth fast. You can do much more interesting things with that bandwidth, like using bigger/more textures. Or add more detail by rendering more models.

ARB_shadow i don’t use, but i can do it on my own with fp fast.

Now you’re just making things up.

ARB_shadow requires a bilinear blend between 4 neighboring texels (but not in the same way as a bilinear filtering operation). As such, any ARB_fp implementation requires at least 4 texture ops. Which, on an ATi card, means a minimum of 4 cycles. This is vs. the half-cycle that nVidia’s cards offer. And that’s just the minimum; I don’t know precisely what the code would require off the top of my head, but I’d guess somewhere around 6 or so. That’s a 6 instructions out of 32 possible paired-texture/ALU instructions. Compared to half of an opcode.

You may as well say that hardware developers should start removing all texture filtering from their hardware since you can just write it in the fragment program.

Just because you can do a thing doesn’t mean that you should. There is the right way to solve a problem, and “the way I have to because somebody didn’t do it for me”. nVidia did it the right way.

i would never buy a card that performs bad at normal tasks.

Who is to say what a “normal” task is? What you consider normal is clearly different from what many people consider normal.

Also, you don’t need floating-point textures everywhere.

You don’t need floating-point textures, period. Especially in performance applications, where even using a 32-bit texture, as opposed to S3TC, can cause a degrading of performance.

good art don’t make a bad renderer good. bad art don’t make a good renderer good.

3dsmax (4 it was), was a most the time fixed point nonhdr rastericer. believe me, my artist wined on it, doing art… colour clamping artefacts, improper reflections, etc… complete mess when ever you startet on something complex.

our task is programming a good solution to the rendering. fixedpoint rastericing by itself isn’t.

you’re ridiculous talking that floatingpoint is unneeded. in the MATH part, it IS needed. for any rather complex shader. in the STORAGE part, it is required to be supported, because sometimes you NEED uncompressed data. for example if you want to do the two pass displacement mapping algorithm that got showed quite some times now. you don’t want to compress your displaced vertices with RGBE, do you?

on a radeon, you can for example runtime expand those compressed meshes shown on siggraph, where a mesh gets unwrapped onto a 2d map. you can, on nvidia, too. just not with a normal 2d texture, it has to be a rectangle… THAT is a ridiculous restriction imho, and there is NO reason that justifies something like that. it doesn’t make sence, does it?

sure there ARE reasons. i mean, it happened, it is fact that its like that today. but the reasons are not GOOD. it shouldn’t happen.

and your claim that floatingpoint is useless is wrong. you would know that if you ever would’ve been able to work with real renderers…

and i know the other extreme… coding fixed point raytracers… fun, yes, usable? no way.

you are very restricted in your view on what you can do on your gpu. and you are in the believe that you’re right that this is the only way it will ever be. believe me, you’re wrong. everything changes.

and high res textures are not the important thing. they don’t mather. q3 has ultrahighres textures. on 1280x1024 with fsaa 4x and 16x af, they look perfectly sharp!!! i mean, i stand on a huge detailed perfecly sharp… FLAT TRIANGLE. its nothing more. and its VISIBLE to be nothing more. and why does it look like more in movies then? why does it look like more in 3dsmax (never than 4 at least ) then? because there its NOT JUST a texture mapped onto a triangle. its because of complex float math evaluating a much more complex surface.

high res textures don’t hide that its not more than just a map, and a triangle. look at unreal tournament 2003. much more triangles than ever before. doesn’t look any more natural, or actually detailed, than unreal tournament did. why? it still has the same visual appearance. plastic polygons.

on the other hand, i’ve seen much art that doesn’t even use one texture. it uses procedural generated data, simple geometry, and looks awesome, much richer in detail. why? because of correct lighting, interreflections, real illumination scattering trough the scene. THAT makes scenes realistic.

i’m just loking around on my desk now… yes, textures would have use… but the most interesting details actually are the materials themselves… paper,plastic,metall,the “wall thingy”… desktop is another plastic by itself…

have ever worked on an offline renderer? or seen one of your artists working with … say… brazil? THAT is graphics. you can ditch q3 in your ass then (sorry for the vulgarity… but hey… if you look at it, you understand it ). thats where we want to get at, one day, don’t we? if you now say NO. then i’m sorry, and stop. if you say “well… if it would run at fast fps, then, yes, i would prefer brazil over quake3 or doom3”, then you have to start changing your thinking. textures are lookuptables of real functions. lookuptables got removed in most parts of modern engines, they get, bit by bit, removed on graphics hw, too…

now, with fragment programs, you don’t use one normalization cubemap, and 2, or 3 distance attentuation textures anymore.

all that doesn’t account for premature optimisation. of course, once you have your work done, you can try to tweak the last out of it. but first, you want at least be able to GET IT DONE. fixedpoint doesn’t bring you far then. at least not 16bit…

if nvidia would provide 128bit fixed point values, it would be another discussion or 64bit at least with 32.32 for example…

you say you don’t need that precicion? but wouldn’t you WANT it?

always map to cpu tasks… how often do you use 8, or 16bit fixedpoint instead of simple floats for your 3d app? if you COULD, you would do the same on gpu’s. you CAN. on ati

May I jump in?

Dave, what exactly are you saying?
You need 1D 2D 3D CUBE float textures?

I think we can remove 1D and 2D.

Why don’t you like RECT fp textures?

Sure, ATI_texture_float is more general and likeable, but personally, I dont know if I will ever use 3D and CUBE float textures.

Todays hardware is still weak really. I would not be able to do much with these extensions.

>>>you don’t want to compress your displaced vertices with RGBE, do you? <<<<

If you dont lose precision, and gain performance …
And also, for vertices you can store them in a RECT. What’s the problem?

Dave, you arguments start sounding reasonable

I guess I know what’s your problem: you are an idealist.

GeforceFX is a good card for usual DirectX 8-9 tasks - I guess it was designed with this idea in mind. In older games they still deliver a bit higher performance than Radeons.

If you want to say that FX is not a DX9 card - I agree with you completely. It’s a migration between DX8 and DX9.

BTW, about float textures: NVidia has pack-unpack operations(which are rather usefull) -ATI not. And rect textures are enought for me - now. Time of fully programmable graphic hardware has not come yet. ATI shaders are also rather limited(dependent texture read, texture read limitations, dot product only to w ).

Both ATI and Nvidia are good cards. API may be more on the side of ATI thought - but you can get good performance with FX too - if you know how to do it.

To earlier discussion: I don’t think that register combiners extension will ever be dropped: it can be emilated by fp easily and that’s the job that must be done only once.

Cheers

>>>GeforceFX is a good card for usual DirectX 8-9 tasks - I guess it was designed with this idea in mind. In older games they still deliver a bit higher performance than Radeons.<<<

OF COURSE it is designed with D3D in mind.
If you compare NV_vp (all of them) and NV_fp to vertex and pixel shaders, you will see that there is plenty in common.
Ditto with register combiners.

The same can be said for ATI. Take a look fragment_shader. It has plenty in common with ps 1.4
It is a identical mirror to ps 1.4

But this doesnt matter. What matters is getting hw features exposed for GL as fast as possible.

>>>If you want to say that FX is not a DX9 card - I agree with you completely. It’s a migration between DX8 and DX9.<<<

It’s a DX9 card cause it can do ps 2.0

Most of the ARB extensions are derived from vendor extensions.
The only problem is, ARB extensions are more limited, but better designed and exist for generations.

So what do you want? Quick ARB ext
or evolution from vendor ext -> ARB