One Pass Rendering Pipeline!

Brolingstanz · June 14, 2006, 10:25pm

I second the text magic. You could parse an any old script into something a GLSL compiler will understand. You could even add game event logic, but that might hit you hard in the combinatorials.

P.S. Zed, just so you know I’ve reverse engineered your game engine and I’m releasing my first title next week.

CrazyButcher · June 15, 2006, 1:23am

the automatic precompiling also worked well for FarCry. It has like thousands of precompiled Cg shaders

zed · June 15, 2006, 9:56pm

Originally posted by Leghorn:
P.S. Zed, just so you know I’ve reverse engineered your game engine and I’m releasing my first title next week.
youre doing better than me than, looks like this weekends not gonna be to productive also, buggered my left hand + are forced to type with just my right.
u would of thought after all those years of wnnking my wrists would of been stronger
ps latest in-game video
http://www.filefactory.com/?35b6ab
sorry im not singing in this one

golgoth13 · June 16, 2006, 11:54am

the automatic precompiling also worked well for FarCry
almost 3k shaders afaics… design speaking… im definitely not convinced… but it does workâ€¦ im going to the bottom of the ultimate umber shaderâ€¦ and see how it goesâ€¦ still hoping to get korvac feedbackâ€¦ till thenâ€¦ keep it up guys and thx for the inputs!

SkeletalMesh · June 30, 2006, 8:52am

:eek:

The first post on this thread ( all posts by starter of this thread ) is the funniest ‘Guru Pretender’ stuff I ever read…

He has completely confused about what textures are, how GPUs work & what is programmable shading… But, thinks he is a next-gen technology architect… May be, he had wished that he would, some day write something like carmack’s .plan file, but is too impatient to wait untill he can get to that level…

I really wonder you guys are trying to explain him things with such an effort. How can he even ‘GET’ anything ?.

I know im hard-hitting very hardly… but seriously… this is complete time waste… and pretention…

Golgoth: Dude… Id say you invest a bit of time to thorough your understanding of the rendering pipeline, starting from FFP towards Programmable…
My guess is your are an experienced game-coder…
Shouldnt take you much time… With a thorough grip on basics, the time you spend on design & prototyping of rendering engine would be really shorter…

Another suggestion is you can download Ogre3D or someother Opensource rendering engines out there… Ogre3D is definitely worth a try…
I’d say ‘Why re-invent the wheel, when you already got a radial-tyre for free ???’

You can rather use your time on so many other important tasks related to engine development & setting up efficient production pipelines, artist tools, etc… instead of getting into the low-level nitty gritty details of next-generation rendering pipelines…

Well, if you really love getting deep with this stuff (just like me) you can… but, your R&D time is also gonna burn project budgets… You can evaluate in that direction… Not to mention the risk of running into dead-end problems in the middle of production, that this brand new bug… stopped your artists… It can be disasterous…

Commercial engines or free ones like Ogre3D which is of commercial quality are a result of many man years of work… and feature additions… bug fixing cycles… R you sure you wanna go through all that at the cost of your project time ???

Korval · June 30, 2006, 12:23pm

The first post on this thread ( all posts by starter of this thread ) is the funniest ‘Guru Pretender’ stuff I ever read…
This is not a helpful message. If you’re just going to attack someone for their lack of knowledge, your presence is not required here.

ebray99 · June 30, 2006, 12:42pm

I agree with Korval. This does nothing for the discussion.

Korval: above you said stencil must be done in 2n+1 passes? I assume the +1 is the z-prepass (or ambient pass), but what is the 2 for? In my experience it requires n+1 passes.

Kevin B

Korval · June 30, 2006, 5:52pm

Korval: above you said stencil must be done in 2n+1 passes? I assume the +1 is the z-prepass (or ambient pass), but what is the 2 for? In my experience it requires n+1 passes.
The +1 is the ambient pass. If you do a z-prepass before this, it is +2. I suppose you could fold the ambient pass into one of the other lighting calculations, thus reducing it to +1.

And I consider the rendering of each light’s stencil shadows a pass. It’s not a full shader pass, but it is taking up loads of fillrate due to the long, multiple, likely overlapping volumes being rendered.

SkeletalMesh · July 1, 2006, 6:48am

Originally posted by Korval:
[quote]The first post on this thread ( all posts by starter of this thread ) is the funniest ‘Guru Pretender’ stuff I ever read…
This is not a helpful message. If you’re just going to attack someone for their lack of knowledge, your presence is not required here. [/QUOTE]Agreed it does not ‘add’ to the discussion and might sound like attacking someone. You havent figured my intention. We all came from the same stage of ‘lack of knowledge’… Including me.

Why would anyone think asking questions is something to point at ??

But, these kind of statements were really irritating…

opengl/hardware failed to fullfilled my needs (welcome to the club you ll say)â€¦ first, the maximum of lightsâ€¦ second, the max texture matrix stack which is 10 on my gf 7800 and 8 texture unitsâ€¦ why clamping those values so low? What is the problem with those? Still want the cruel truth here! How can Opengl developers not doing anything for this aberrationâ€¦ what s going on, who s in charge here?
Anyways, looks like you ignored a majority part of my post which was addressed to Golgoth, and the best approach he can go about… I stand on my point that it will be more helpful & productive to re-use free resources, as building an engine from scratch would be a significant time investment given the level of know-how…

Trying to explain Cache coherence, and teach realtime shadowing algorithms, and next-gen engine design on threads just does not make sense to me… nor do i think it is helpful to anyone…

golgoth13 · July 1, 2006, 11:47am

Hi everyone!

The first post on this thread ( all posts by starter of this thread ) is the funniest ‘Guru Pretender’ stuff I ever read…
That is one way to see it! English is not my first language so some comments may come out the wrong wayâ€¦ in my defenseâ€¦ first post, first paragraph:

I can easily say that I m behind! But I ll like to catch up on the rendering architecture!
I m impatient and I m clearly not a guru but I have some ideas!

My guess is your are an experienced game-coderâ€¦
Wrong, before I started c++ tutorials, I was a full time artist in the game industry for 10 years, you probably played games I ve worked on. anywaysâ€¦ I didnâ€™t know the rabbit hole was going that deepâ€¦ and you are rightâ€¦ there is pieces of the puzzle I donâ€™t have yet and thatâ€™s why im hereâ€¦

Id say you invest a bit of time to thorough your understanding of the rendering pipeline
I m working on it… every dayâ€¦

opengl/hardware failed to fullfilled my needs (welcome to the club you ll say)â€¦ first, the maximum of lightsâ€¦ second, the max texture matrix stack which is 10 on my gf 7800 and 8 texture unitsâ€¦ why clamping those values so low? What is the problem with those? Still want the cruel truth here! How can Opengl developers not doing anything for this aberrationâ€¦ what s going on, who s in charge here? …
everyone seams to know the answerâ€¦ I donâ€™tâ€¦ and I do not want design hardwareâ€¦ but I may consider introducing a new chapter in my .MasterPlan fileâ€¦ : )

I stand on my point that it will be more helpful & productive to re-use free resources, as building an engine from scratch would be a significant time investment given the level of know-how…
Even thought you make it sound like ive just opened first person nehe tutorialâ€¦ thx for the tip, but it is to late for me nowâ€¦ I know what im looking for and I ve dig too deep to go backâ€¦ I was hopping for the big picture, a bullet points sequence, on how lighting is currently handle in a rendering pipeline and discuss next gen possibilities with experienced coders! I have it working in different waysâ€¦ but im still not happy with itâ€¦ as the title of this post, without including shadow calculations, is one pass rendering pipeline possible?

Trying to explain â€¦ next-gen engine design on threads just does not make sense to me… nor do i think it is helpful to anyone…
I m really interested, why would you say that?â€¦ because no one is really sure where it is going or because people that can answer are bounded by ndas?â€¦ If we cant discuss it here where then?

regards

golgoth13 · July 1, 2006, 12:05pm

The +1 is the ambient pass. If you do a z-prepass before this, it is +2. I suppose you could fold the ambient pass into one of the other lighting calculations, thus reducing it to +1.
if by ambient pass you meant controlling the color and shadow opacity with ffp… you may find this interesting:

GL_TEXTURE_COMPARE_FAIL_VALUE_ARB

unfortunatelty the extension is not suported on my gf7800 for some reason… so i couldnt validate it… but it should save you the ambient pass.

system · July 1, 2006, 12:27pm

Although years have passed and passing, some things are not likely to change much. The top game makers don’t absolutely need a huge matrix stack or a zillion lights, and 100 texture units, etc. The hw vendors have to make sure they balance things out so that decent cost/performance can be acheived, even if it means multipassing.

“the maximum of lights”

Several reasons :

The minimum GL asks for is 8, so most offer 8.
GL does vertex lighting. Who likes that?
Shaders are the future. Code you 10,000 lights yourself. Do it per pixel.

“8 texture units”

Today, it’s called texture image unit. That means a texture unit is not related to a texture coordinate
8 is not bad by todays standards. ATI and NV offer 16, and they offer 8 tex coords.
Use generic vertex attributes.

“matrix stacks”

Does it matter? Does it improve performance?
Pretend there is no stack at all.
Write your own software stack

You could always consider a 3rd party solution because all this may be above you.
If you are interested in learning GL programming, you have to code it yourself.

golgoth13 · July 1, 2006, 2:13pm

The hw vendors have to make sure they balance things out so that decent cost/performance can be acheived, even if it means multipassing.
I disagree with this, it should be up to the developers to decide whatâ€™s most importantâ€¦. hw vendors should leave the door open.

So my question is, what is it that makes textures or light units clamp to a certain number? What difference does it make to hw vendors? Why donâ€™t leave 128 texture units for instanceâ€¦ is it just physically impossible, maximum instruction on ship? If yes, Why cant hardware take care of multipasses on its own?

GL does vertex lighting. Who likes that?

the reason I ll like to get more vertex lights is only for the gl_ build in variables that we can retrieve light info in shaders… since im doing all lights in one pass, using light struct in shaders will overload the shader quickly… plus, sending light variables via uniforms to shaders is costly… donâ€™t you agree? I know ffp as to do it as well… but it seams to do it in a more efficient way… and does not influence max shader instructionsâ€¦ maybe someone can confirm this.

8 is not bad by todays standards. ATI and NV offer 16, and they offer 8 tex coords. Use generic vertex attributes.
On current nvidia hw there is 16 attributes and 13 or already reserved by standard vertex attributesâ€¦ 1, 6 and 7 are freeâ€¦ I ve filled up those alreadyâ€¦ so I have to make cruel choices if I m going to use gltexcoord attributes for something else that what there meant forâ€¦ I heard that there is no attribute overlaps standard/generic on ati hw… which leads to 16+13 attributesâ€¦ right?

Does it matter? Does it improve performance?
Yes it doesâ€¦ it says 10 but I cant go more then 8 for some reason or gl crashâ€¦ if you have 2 shadow map and 1 occlusion map, it leaves 5 for texturesâ€¦ the problem with this is that 3d assets sometimes need several uvs for artists to work on texturingâ€¦ to over come the limitation we must flatten the work before exportâ€¦ but once it is flattened it is a real pain to make more work on the assets uvsâ€¦ and rollback on each assets before flatten is nonsenseâ€¦ which is a major down side on workflow performance. 8-10 is a tight closetâ€¦ at 16 (which will make sense to match the texture unit doesnâ€™t it?) we will start to breath a little! Each texture unit should have its own matrixâ€¦ If I wrote my own stacks I ll have uniform overflow.

You could always consider a 3rd party solution because all this may be above you.
The whole thing is hw related afaics, 3rd party will still have to deal with multipassesâ€¦ see title of this post! If â€œvendors have to make sure they balance things out so that decent cost/performanceâ€ is the answer to all thisâ€¦ thatâ€™s all I neededâ€¦ im gonna put this one pass thing on hold and hope for more power under the hood in the next gen video cardsâ€¦ till then, do as everyone else and go multipassesâ€¦ and this thread is close!

ZbuffeR · July 1, 2006, 2:31pm

what is it that makes textures or light units clamp to a certain number? What difference does it make to hw vendors? Why donâ€™t leave 128 texture units for instanceâ€¦ is it just physically impossible, maximum instruction on ship? If yes, Why cant hardware take care of multipasses on its own?

sending light variables via uniforms to shaders is costly…
You almost answered yourself. You can do it currently, but it is slow. Why ? Because there is not this ultra-fast hardware support. Why ? Because more hardware means more silicium transistors. Which means more potential failures, which means higher production costs.

Of course, if there is a market, it is done. First GeforceFX batches had almost 50% of reject rates at the end of line …

SkeletalMesh · July 1, 2006, 2:34pm

what is it that makes textures or light units clamp to a certain number? What difference does it make to hw vendors? Why donâ€™t leave 128 texture units for instanceâ€¦ is it just physically impossible, maximum instruction on ship? If yes, Why cant hardware take care of multipasses on its own?
Its expensive… Transistor count wise… Chip manufacture cost wise… Heat emission wise, etc…
Physically impossible… in a way yes…

They can only put n number of transistors on chip… As technology is going ahead, they are reducing transistor size (fabrication process) and able fit more transistors on the chip…

Basically ‘Texture units’ or ‘Light computation units’ are… at the lowest level, bunch of transistors layed out to compute some stuff…

If you can look at how hardware evolved, it would be obvious…

Riva 128 = 1 Texture unit
Riva Tnt = 2 Texture units
Riva TnT2 = 2 Textures but faster & more features
Geforce = 2 Textures, faster, lighting units added (hardware tnl)
(till this point these texture units could only do a bunch of ‘Fixed ways of handling texels’ Fixed function that is… ‘Add these two textures’… ‘Multiply these’ ‘Multiply 2X’, etc…

Geforce 3 = 4 Texture units… And also programmable…

With each generation… there are more transistors you can put on chips… Its not just texture units… but there are a lot more things GPUs need to do… So, GPU designers break their heads no how to best use the given transistor count…

Its a balancing act…

Komat · July 1, 2006, 2:54pm

Originally posted by Golgoth:

So my question is, what is it that makes textures or light units clamp to a certain number? What difference does it make to hw vendors? Why donâ€™t leave 128 texture units for instanceâ€¦ is it just physically impossible, maximum instruction on ship? If yes, Why cant hardware take care of multipasses on its own?

Anything you add to the HW increases cost of the chip both directly (less chips per silicon waffle, higher chance of the waster) and indirectly (higher cost of the development and testing) . Unless there is a high demand and reasonable use for having 128 separate textures accessible from the shader, that cost is simply not justified.

the reason I ll like to get more vertex lights is only for the gl_ build in variables that we can retrieve light info in shaders… since im doing all lights in one pass, using light struct in shaders will overload the shader quickly… plus, sending light variables via uniforms to shaders is costly… donâ€™t you agree?

What you really need is having a uniforms shared between programs which were unfortunatelly not included in the GLSL api.

Each texture or lighing unit has nontrivial semantics and state associated with it for purpose of the FF pipeline beyond having the values you access from the shader. If driver reported say 128 lights, it would have to support FF function with all those lights enabled with maximum features.

nvidia hw there is 16 attributes and 13 or already reserved by standard vertex attributesâ€¦ 1, 6 and 7 are freeâ€¦ I ve filled up those alreadyâ€¦ so I have to make cruel choices if I m going to use gltexcoord attributes for something else that what there meant forâ€¦

You can use generic attribute that the nVidia aliases with conventional attributes, you simply can not simultaneously use that conventional attribute. To simplify things, do not use the conventional attributes at all and send all input trough generic attributes.

golgoth13 · July 1, 2006, 2:55pm

Thx guys!

this is really interesting.

dont forget, about 12-15 years ago, a sgi workstation geared up for photoshop and softimage on unix cost around 50 to 100 thousand… my question is, beside the mainstream market right now… anyone is aware of an high-end video card that is pushing the current limits? some cards that would be mainstream in 2-3 years maybe?

SkeletalMesh · July 2, 2006, 1:15am

The whole thing is hw related afaics, 3rd party will still have to deal with multipassesâ€¦
Exactly… So, it means the same issues exist even if you are using Direct3D for the rendering engine…

It might make things much clear, if you can spend some time on Direct3D API as well… You would see that the numbers (texture units, texcoord sets/interpolators, uniforms/constant regs) are exactly the same there… It would help you isolate API limitations and hardware limitations…
Though I dont think either API’s add up any limitations… You can do same things on either API. Convenience differs. Engine Design is purely Hardware architecture constrained…

Basically we are programming the same hardware, but with different API… Same thing applies to shader languages… GLSL / HLSL… The syntax changes here and there…

You might also find Direct3D10 preview & documentation a very interesting read… The next generation of hardware to come… D3D10 hardware (again, by D3D-10 I mean, d3d10 gen hardware… Or hardware that confirm to Direct3D 10 specifications) is gonna be a happy time for us coders and artists… As most of the limiting numbers of this generation, that you talked about are going to be magnitudes higher. Also, there is a new Shader type added… Geometry Shader that sits in between Vertex & Fragment processing pipe & can be used for lots of exciting stuff

SkeletalMesh · July 2, 2006, 1:39am

Originally posted by Golgoth:

my question is, beside the mainstream market right now… anyone is aware of an high-end video card that is pushing the current limits? some cards that would be mainstream in 2-3 years maybe?
2-3 years, then D3D10 HW is probably what you are looking for… You will find these very interesting…

http://www.gamedev.net/reference/articles/article2316.asp

http://enthusiast.hardocp.com/article.html?art=MTA0NSwxLCxoZW50aHVzaWFzdA

While PlayStation-3 GPU (nVidia RSX) and ATI-Xenos GPU on XBox36o are pretty much current gen hardware ( D3D9c / GL 2.0), Xenos is something in between D3D9 & D3D10, more towards D3D9

The next next gen of 3D hardware could be something completely different from Rasterizer GPUs we have today…

See… Rasterizer 3D rendering is basically a hack to show 3D worlds at interactive speeds… In reality Shadows are absense of Light… But, it does not apply to our rendering techniques… because both lighting & shadowing we do now, are basically two different hacks stacked up to make believe they are inter-related

Where as Ray-tracing is more like how lighting/shadows work in the real-world… Ray-tracing would fundamentally solve issues which are so hard to achieve on todays GPUs… like caustics, complex, fully dynamic lighting, soft-shadows, Real time Global Illumination, etc…

Things like Area Lights, Reflections, Inter-reflections, Colored Shadows, etc… are so complex & expensive to implement at rough approximations, these come inherent with Ray-tracing solutions…

There are Real-time ray-tracing hardware chips in development, but only behind University Labs… There are prototype hardware & demos (SARCOOR) which r already showing incredible images almost impossible on GPUs… I am not sure if nVidia & ATI are developing something RT in their backyard, but, if properly funded, RT Hardware would be truly a revolution in realtime rendering…

KC

Overmind · July 2, 2006, 3:50am

That’s not entirely true. Raytracing is just another approximation of the real world. Of course, some things like arbitrary surfaces, reflection and refraction are better described by raytracing.

But diffuse lighting, caustics and realistic (soft) shadows are still a weakness of raytracing. Of course, we still have some hacks to solve these. For diffuse lighting, the same approximation as we use it now is used. Caustics and soft shadows can be done with casting a lot of rays, but that’s still an approximation.

I also disagree that colored shadows and so on are hard to implement with a polygon renderer. Hard shadows are something that polygon renderers can currently do very well, and doing soft shadows is equally hard on a raytracer.

I agree that a hardware raytracer would be nice to have. But I don’t think that quality would improve that much, because all these hacks work really well. Raytracing is not really more realistic, it is just a different approach, not necessarily better or worse than polygon rendering, with each method having it’s own strengths and weaknesses…