Carmack .plan: OpenGL2.0

Someone has the data, I really don’t care who.

That data is gone. Note that the rendering pipeline goes one way: vertex shader/fixed-function feeds the rasterizer, which feeds the fragment pipeline, which feeds the pixel blender.

In order to do what you’re suggesting (having post-T&L data lying around), the vertex shader would have to be done in software and the data stored on the CPU. This is, quite simply, completely unacceptable from a performance standpoint.

It is currently impossible to simply read back data from a vertex shader and store it to be multi-passed over again. Not only that, as I explained before, you can’t do that, since you need to run the other portions of the shader on each individual set of data.

This means, on the first pass, the HW would interpolate the first 4 UV’s, then on the last pass, interpolate the last 4 UV’s. Not all that complicated.

Then let me complicate it for you.

Let’s assume your equation is the following:

T1 * T2 + T1 * T3 + T1 * T4 + T1 * T5 + T2 * T3 + T2 * T4 + T2 * T5 + T3 * T4 + T3 * T5 + T4 * T5 = output color.

Oh, and the output color will be alpha blended with the framebuffer.

Given a Radeon 8500 (with more blend ops, perhaps), this is trivial; no need to multipass. Given a GeForce3, this is most certainly a non-trivial task in reducing the

Note that each texture coordinate came from a vertex shader program that may have performed similar opertations

It just might be a little slow, worse case.

But it means less work for me, and probably isn’t going to be any slower than any fallback case I would need to write to support that effect anyhow. But I don’t see how it would be slower, since alot of the work isn’t being duplicated anymore.

Multipass == slow. It is far slower than a single-pass hack. I, for one, refuse to use any multipass algorithm unless it produces a particularly good effect (and even then, it had better only require 2 passes).

Not only that, if you’re building your shader relatively dynamically (say, based on slider-bar values or a configuration screen), then the shader ‘compiler’ has to compile dynamically. Splitting a vertex shader into two passes is a non-trivial algorithm. It can, even worse, make the vertex shader even slower.

Not only that, verifying that a shader fits within the resource limitations of the hardware isn’t a trivial task.

Saying that this kind of thing is a relatively easy task that will not impair the performance of the hardware is simply erronous. Besides, I’m more inclined to believe Matt than John Carmack about the potential nightmares of implementing such as system in drivers. Carmack’s job is to get people like Matt to do their work for them.

I think you’re on the right track. While both the NV20/25 and R200 support some kind of loopback to extend their texture stages, one could still argue that it’s just ‘pipe combining’ and you can’t go over your total limit of physical TMUs (of all pipes combined).
That’s the easy way to do it, I believe, but what do I know about these chips, really …

The reason there is a limit to what can be done in a single pass is that there is a limit to how much texture-state information can be stored on-chip. Take the original TNT, for example. It has only 1 texture unit. But it has register space for 2 active textures, which were accessed via loop-back. Most of the time, it is more efficient to store additional register state for active texture objects than to actually have more texture units.

As to why there isn’t more loopback? Simple: register space isn’t cheap. Because the Kyro was a tile-based renderer, it could probably get away with having lots of register space for texture objects per polygon (for some reason. I don’t know enough about the specifics to say why, but given the unorthadox nature of tile-based renderers, I would be willing to believe it).

Hi all,

I dont really understand a thread with John Carmack’s name getting sooo hot. I have a few points to make.

  1. John Carmack is what he is now because he’s got amazing people with him doing those graphics.

  2. Though there were a lot of games using the Quake3 engine there was nothing that could make use of it as much as id software did. Again, the credits goes to those graphic designers who brought the game alive.

  3. Stop worshiping him as an idol, He’s just like u and me.

  4. If John Carmack is reading this I am sure He understands this.

  5. I am not jealous of him or something, I still think he is as good as you and me.

  6. I can go on…

-Sundar

[This message has been edited by Sundy (edited 06-30-2002).]

In order to do what you’re suggesting (having post-T&L data lying around), the vertex shader would have to be done in software and the data stored on the CPU. This is, quite simply, completely unacceptable from a performance standpoint.

But doesn’t the HW have this data though? This is where it should happen. The HW simply re-rasterizes the triangle.

T1 * T2 + T1 * T3 + T1 * T4 + T1 * T5 + T2 * T3 + T2 * T4 + T2 * T5 + T3 * T4 + T3 * T5 + T4 * T5 = output color.

Yeah, unless you did some crazy work, this shader would fail. You would have to place one restriction on a shader (and you might think this is a huge restriction). Lets say you had 4 TMU’s (TMU0-3), but you needed 8 TMU’s in your shader (TMU0-7). The restriction would be, that TMU0-2 could not interact with TMU5-7, and vice versa. However, TMU3/TMU4 could interact. You can think of these 2 TMU’s as the bridge between the virtual TMU gap.

In this case, the in-between result of TMU3/TMU4 would require a temp buffer to “carry over” the results, so the shaders can be combined. I know your thinking that I can pull this off now, using a render target and current HW, but this is not very scalable, and my shader would never take advantage of future HW, unless I planned ahead.

Instead, as more TMU’s are introduced, my shaders just get faster, without me having to do anymore work.

Oh, and the output color will be alpha blended with the framebuffer.

This is a post operation, that would be handled no differently then it is now. Would it not?

ahhh eassssyyyyyyyyy you found the final solution. doing the multipass per triangle. i mean, state changes are cheap, we can do that all the time…

does not sound that you know that much about how the hw works, do you?

i know it CAN be done to set up the shaders automatically but as the different parts of the gpu have really much different power and programability, its hell of a complicated thing to get this working. if you say its that easy, why not providing some own interface. you can gen an ext from it if you want, and nvidia and the others could then implement it directly to gain some speed. i mean, you should get it working as well…

there isn’t even a really simple register combiner language possible, they are too specialized… you’re faster coding that stuff directly… setting up a multipasssystem is in fact quite easy, but the different passes i want to code myself. no one can beat me at fast coding shaders. no computer, at least…

and if you can, you prefer staying at singlepass by dropping some little features/accuracies…

Sundy, i’m 100% with you. There is some people here that need to face reality, and stop worshipping Carmack. I’m tired of hearing Carmack here, Carmack there, Carmack is the best programmer in the world, etc… He’s a good game programmer, point. There is many people better than him. He did not invent BSPs. He did not invent lightmaps. And he certainly did not invent per-pixel lighting. His code design/quality seems to be very average (efficient ok, but not nice). And what about all these people that are working with him ? Artists, true, but also other programmers, musicians, designers, etc… ? Don’t they deserve as much credit as him ? Would you be so impressed by Doom3 with crappy graphics ? Sorry for the rant. I know it won’t change anything, but i feel better…:slight_smile:

Y.

Nope, just 2 core engine programmers at id - carmack and another guy.

[QUOTE]Originally posted by mcraighead:
> I honestly see it as infeasible, or
> at least much worse in performance.
>
> The only way I can imagine implementing
> it in the general case is to use a SW
> renderer. (And that’s not invariant
> with HW rendering!)
>
> And eventually, for a big enough program,
> the API has to refuse to load it.

Why? For any assembler-level program, no matter how large, I can imagine a trivial (totally inefficient) multipass implementation in which every instruction is implemented as a single pass, with intermediate results being written to memory. Work backwards from there, collapsing passes as far as you can, to imagine a more optimal implementation. Loops and branches might need to be done in software, but that doesn’t introduce invariance in itself.

> The easiest example of a shader that
> cannot be broken down into multiple
> passes on a GF3 is a specular exponent
> done in real floating-point math.

[snip]

> If you don’t have this math operation
> per-pixel (at bare minimum, you need
> per-pixel exp() and log() to emulate
> pow()), it’s essentially impossible to
> emulate it.

Yes, but the problem then isn’t really a problem to do with multipassing per se; you can’t even do that single operation in a single pass, if I understand you correctly. So it’s not a very interesting case, as a counterexample for multi-passing, is it?

If your point is that multi-passing alone isn’t enough to let you emulate any given program on any given piece of hardware, then sure you’re right. You hardly need a counterexample to prove that. The hardware must support the basic operations of the language at some level, if you want to run any programs at all. So I don’t think your example is very relevant.

Originally posted by knackered:
Nope, just 2 core engine programmers at id - carmack and another guy.

Sure about that? There are 5 programmers on Doom 3:
John Carmack
Graeme Devine
Jim Dose
Robert Duffy
Jan Paul van Waveren

I think he means `core programmers’. I wouldn’t expect all those people are working on the engine. I expect some to be working on scripting and tools etc.

Originally posted by Robbo:

I think he means `core programmers’. I wouldn’t expect all those people are working on the engine. I expect some to be working on scripting and tools etc.

Primary Contributions:
John Carmack’s - graphics
Graeme Devine - According to JC "Graeme’s primary task is going to be a completely new sound engine "
Jan Paul van Waveren - physics engine
Robert Duffy - Tools
Jim Dose - Game logic, animation system, scripting

So that’s at least 3 “core programmers”.And
thats assuming you dont consider tools or scripting systms to be a core component of the engine(and as complex, customized, and integrated as these are I think personally I would consider them to be “core”).

I think some people need to get away from the notion that graphics are the game, graphics make the game, graphics are the toughest part of the game, graphics programmers deserve all the real credit. I personally consider a good physics engine to be much more complex and difficult to implement than a good graphics engine.

[This message has been edited by LordKronos (edited 07-01-2002).]

Usually everything is built around a scripting system so I would consider this one of the main components of the engine too.
I agree, a physics engine is a lot more complex to implement than a renderer ( requires some theory too ). The actual rendering is not what’s impressive, it’s the preprocessing that needs to be done on the geometry ( and this part can get complex ).

JC only does the fun part, physics & collision detection are the real nasty parts of an engine. How many games are there where you have good graphics but you get stuck in walls things float in mid air,…

Core graphics engine programmers, I was talking about - seeing as though the discussion is about graphics. I used the word ‘core’ to avoid replies like yours lordkronos…obviously naively hoping that, just for once, the discussion wouldn’t shoot off up another semantics avenue.
Yes, collision detection, physics, and sound are important too - yawn.
Ysaneya - all ID games (except perhaps the original doom) have contained distinctly average graphics and sound assets, the gameplay is renound for being an empty experience, with little imagination, and the tools are bad (compare qradiant to worldcraft to see what I mean…qradiant was written by an ID person, while Worldcraft by someone at Valve)…so I wouldn’t give any of the other contributors to their various titles any credit at all… the graphics engines have always been first class, very efficient and capable (if you like wandering around indoors). It’s left to the mod makers, and people who license the engines to make the ‘real’ games.
There…counter-rant over.

The reason Carmack gets lots of credit is because of what he did in the earlier days, Doom and Quake were great leaps and the graphics engines for those probably were pretty hard to program (this includes having them run at acceptable speeds).

Yes, now there are more coders but then much more goes into a game, it’s no longer acceptable to have mediocre sound or buggy physics or stupid AI, when every other developer spouts off about how ‘revolutionary’ their approach is to a particular area mentioned above (they never say revolutionary graphics engine though, because they know all id’s will wipe the floor with them…). But anyway, lets just see how D3 turns out.

-Mezz

I have to differ with all of you on what is the most important aspect of a game. At 31 I can no longer stomach rushing around being `fragged’ every 30 seconds with some spotty kid trying to stick a rocket up my backside.

For me the critical aspect of a game is in the design ( not of the engine, but the storyboard ). I cannot over-emphasise the importance of narrative. Even totally linear games like Half-Life and Homeworld are a real joy to play because of the story thats unfolding before you. Quake III (for example) didn’t have this narrative so for me it was eye-candy coupled with a Motor\Somatosensory Cortex and Cerebellum test [fun for a while but not much `I wonder what happens next’ in there].

I read that ID have a fiction writer storyboarding the entire game. This is good news for all fans of narrative. Couple that with JCs obvious graphical skills and I can see it being a winner.

Personally, I think the really amazing things Carmack did were the original Doom and the step up to 6DOF in Quake. Since then things have been progressing steadily, but nothing groundbreaking has arrived. Perhaps Doom III will be the first game to put it all together (graphically) - and show what the new gfx technology can really do. I’ve seen lots of stencil shadow demos, lots of bumpmapping demos, lots of lighting demos etc. I can’t wait to see a game that does the lot.

Just a few thoughts above.

Originally posted by ehart:
[b]
Just a couple random comments here.

[snip]
[/b]

I agree fully with Evan’s post.

OpenGL2 is setting a direction for hardware to grow into. Supporting current generation hardware is not one of its primary goals. Having the API chase the hardware (as has happend for the last several years) has gotten OpenGL in the state it is in currently; fragmented, and lacking behind the other major API in feature set. Only by setting a clear path forward for the next 5 years will OpenGL survive. Anyone claiming it is ‘too hard’ to implement parts of OpenGL2 on yesterday’s hardware does not understand the real purpose of OpenGL2.

The new API ideas that have been proposed come from conversations with real ISVs facing real problems. In addition, we have tried to harmonize features that are readily available today, but through a different extension on each vendor’s platform. Hardware vendors need to recognize this, and adapt their future hardware design to facilitate the new API ideas. (We can discuss what these new ideas are exactly, but if the fundamental principle of a “forward-looking API” is not agreed on, feature discussion is a moot point).

Note that this approach is nothing new. This is how SGI back in the old days set a vision for OpenGL 1.0. Most ARB members at that time were not able to do in hardware what OpenGL 1.0 was proposing, but they went with it anyway, and adjusted their next generation hardware plans accordingly. That approach made OpenGL wildly succesful.

John Carmack has an attractive point. Why should he, or any ISV, have to worry about all kinds of hardware resource limits, while writing shaders in a high level language? Why should an ISV care how many instructions a fragment shader happens to be able to use? How many temporary registers there are? How many uniforms it can use? How many texture stages it can use? If you have to worry about those limits, you’re still writing different back-end renderers for different hardware. The whole point is to enable the ISV to write fewer (preferrably one) back-end. Now, if they want to spend the effort in writing more back-ends, because of performance, invariance issues, fear of driver multi-passing etc, they can do so. But they are not required to do so. Of course this goes hand in hand with some kind of query mechanism where you can find out how well a given shader maps on a certain piece of hardware (how many passes it would take to run this shader, for example).

OpenGL2 is the direction of the ARB (see the last ARB meeting minutes). An OpenGL2 working group has been formed, and the majority of the ARB members have volunteered people to participate in this working group. Initial drafts of specifications for the OpenGL Shading Language and the the OpenGL 1.3 extensions to support it were circulated to the ARB three weeks ago. Progress is being made in converging onto what it exactly should look like. 3Dlabs has played an important role in nurturing the OpenGL 2.0 initiative, but our goal here is to provide a forward looking API that exposes next-generation programmable hardware at the highest level possible, and beyond any vendor’s specific “hardware gadgets”.

Barthold
3Dlabs

ahhh eassssyyyyyyyyy you found the final solution. doing the multipass per triangle. i mean, state changes are cheap, we can do that all the time…
does not sound that you know that much about how the hw works, do you?

Sarcastic much?

I realize that this would be hell on the internal texture cache, etc, of most current HW. I’m just thinking out loud. Thinking a little “outside of the box” if you will.

Ok, worse case scenerio, the entire emulation is done in the driver. As a matter of fact, the driver may very well do the same thing I would have done, or something close. But this is for the driver to figure out. There still may be limitations, and all emulations might not be possible. But, it would make life alot easier for alot of people. I’d rather see 1 driver save the work for 100’s of coders, rather than each and everyone of those coders redoing work, and possibly screwing something up.

As far as worrying that your effect would require too many passes to emulate, there is still nothing stopping you from writing these fallback cases. In this case, I think it is a good idea for you to do this work, as the driver cannot guess how much you would be willing to “give up” to achieve similar results. The driver would only emulate the effect, if the results would be the same.

[This message has been edited by John Pollard (edited 07-01-2002).]

>>John Carmack has an attractive point. Why should he, or any ISV, have to worry about all kinds of hardware resource limits, while writing shaders in a high level language? Why should an ISV care how many instructions a fragment shader happens to be able to use? How many temporary registers there are? How many uniforms it can use? How many texture stages it can use? If you have to worry about those limits<<

i agrre completely.
this has been discussed lots of times in these forums under different guises + the common consesus is.
do the shader, do a small test, time it, is it quick enuf? yes. else use a simplar shader.
in the past another word was inserted instead of shader. eg vertex blending.

i just want to see the driver developer that can code such a complex thing. if it would be that easy it would be that easy for us all as well and we would have yet the shader-converter and there would be a GL_ARB_general_shader now here. its just not to code this… its much much much too much work. cpu’s are much more easy to let compilers generate optimized code. but the gpu has some VERY VERY VERY different parts wich simply don’t work really sweet together. say you want to do some dotproducts and they don’t fit into one pass… blending can NEVER do dotproducts (except with 12 or 24 passes, don’t remember… and with clampingerrors…), so you have to split the dots between the passes and hope they don’t need them directly and all that…

you can draw into rgba independendly to store up to 4 results, bind them as texture afterwards or blend to them in some fancy way (but then you can’t extract the components individually btw…) so what? its just a big nono…

SHOW ME THAT IT CAN BE DONE. you essensially do have the same interface as driver developers do have, at least on nvidia hardware you get full access to the vertexcallback (my name for vs or vp), the texturelookupsettings (textureshaders), the fragmentcombiners (registercombiners), and the framebufferblendings (blending,depthtest,alphatest,stenciltests). the hw can’t do more. possibly the driver devs could implement it a little more efficiont but they can’t add more features.

so DO IT YOUR SELF. SHOW THAT ITS FAISIBLE. then soon you get a job at nvidia or ati company… you would solve the thing that could not be solved since some years now (since the first multitexturing came in we had this problem with the fallback and multipass/singlepass settings)

show me you’re god. carmack would love you…