One Pass Rendering Pipeline!

system · July 2, 2006, 7:06am

Originally posted by Golgoth:
[quote]Does it matter? Does it improve performance?
Yes it doesâ€¦ it says 10 but I cant go more then 8 for some reason or gl crashâ€¦ if you have 2 shadow map and 1 occlusion map, it leaves 5 for texturesâ€¦ the problem with this is that 3d assets sometimes need several uvs for artists to work on texturingâ€¦ to over come the limitation we must flatten the work before exportâ€¦ but once it is flattened it is a real pain to make more work on the assets uvsâ€¦ and rollback on each assets before flatten is nonsenseâ€¦ which is a major down side on workflow performance. 8-10 is a tight closetâ€¦ at 16 (which will make sense to match the texture unit doesnâ€™t it?) we will start to breath a little! Each texture unit should have its own matrixâ€¦ If I wrote my own stacks I ll have uniform overflow.
[/QUOTE]You do know I was talking about the matrix stacks right? There is the projection matrix, modelview, texture and I beleive one for the color matrix which is part of imaging extension.

“If 10 is not enough, then you can write your own software stack” means you pretend the stack size is one and you code yourself something that handles the stacks in your own memory buffers.

golgoth13 · July 2, 2006, 11:26am

2-3 years, then D3D10 HW is probably what you are looking for… You will find these very interesting…
Indeed!

ATI is working closely with Microsoft to make sure the DirectX 10 API and their GPU programmability is accessible to game developers.
developing with Opengl on a Nvidia hardware kind of leaves me perplex a littleâ€¦ but it really sounds promising! regarding new architecture, what is opengl 3.0 juicy stuff? Dx10 vs. Opengl 3â€¦ new titans battle?

DirectX 10 is deeply embedded into Windows Vista operation and we currently know of no plans by Microsoft to allow Windows XP to officially support the new API.
They are going right in our pockets, lets face it, what they really care about is their .ConquerTheWorldPlan file.

What we mean by API object overhead is that the API is using CPU cycles to achieve tasks necessary for rendering before being output to the video card for drawing. When rendering a game, the application first has to call to the API and then the API calls to the driver before it ever interacts with your video cardâ€™s GPU. These calls are all handled by the CPU, using valuable resources and creating a potential bottleneck.
Isnâ€™t it why UNIX system are so stableâ€¦ there is no CPU cycles needed to achieve tasks on video and sound card?

ebray99 · July 2, 2006, 11:39am

Isnâ€™t it why UNIX system are so stableâ€¦ there is no CPU cycles needed to achieve tasks on video and sound card?
It doesn’t matter what OS you use… modern PC architecture requires the processor to dispatch commands to hardware. As far as the API is concerned, it simply provides a standard “interface” to application developers. Under the hood, the API is sending commands to another standard interface in the driver. However, it’s not a simple indirection, but rather a slew of messages, spin locks, semaphores, etc and is really quite a bit of “stuff”. Unix probably doesn’t have as many things to synchronize and schedule as far as graphics and rendering is concerned, however it still requires the CPU get involved. In short, Unix probably just does less.

Kevin B

golgoth13 · July 2, 2006, 9:59pm

Oups, sorry V-man your last post slip to my attention…

You do know I was talking about the matrix stacks right?
I kind of mixed up matrix count and matrix stack depth there… but i was indeed thinking about the the matrix stack depth…

and I beleive one for the color matrix

glMatrixMode(GL_COLOR) comfirmed!

“If 10 is not enough, then you can write your own software stack” means you pretend the stack size is one and you code yourself something that handles the stacks in your own memory buffers.
Humâ€¦ I m going to medidate on this for a while… just cant figure this out atm…

btw: I m still hopping to get korval feedback on the post starting with:

I must say, I was literally glued to the screen, it was really interesting reading you!
If it is not to much to ask.

regards

Korval · July 3, 2006, 2:43am

Why cant hardware take care of multipasses on its own?
Because, when it comes down to it, it would be a really bad idea.

The ARB considered requiring glslang shaders to accept any valid shader, to completely virtualize all hardware limitations and just let the driver figure out how to handle it. Idealistic.

Idealism must give way to practicality. The fact is, hardware really needs to expose those limits to the user. A 5600 can’t do looping at all in the fragment shader, but an 6600 can. A shader based significantly on looping will be virtually unusable on the 5600; you may as well not have bothered to compile it.

Furthermore, the program itself has no indication that the shader is sub-optimal for said hardware. As such, it can’t know that the 5600 compiled shader is going to be horrifically slow without trying it first. Trial-and-error is not the best way to tell whether something is going to work well.

Now, let’s say you, as the developer, are aware that the 5600 shader will be really slow, and the 6600 one will be fast enough. So you develop an approximation that you would like to use on the 5600, something that looks good, but not as good as it could.

Well, now what? You have no way to tell if you’re executing code on a 5600 or not.

By exposing limits, you allow a program to provide alternative shaders for various levels of hardware.

Oh, a driver could potentially virtualize hardware limits (or, some of them, at least. I seriously doubt that virtualizing and multipassing on the number of attributes or varyings is even possible for all shaders). But it would create more problems than it solved.

doing soft shadows is equally hard on a raytracer.
No, it isn’t.

It’s expensive, but it’s not hard. Indeed, it just requires firing more shadow rays. Whereas, in scan conversion, you have to come up with entirely new shadow algorithms.

btw: I m still hopping to get korval feedback on the post starting with:
I don’t know what you want me to say. I explained what I was talking about, and you seemed to get it. Whether your code was correct or not, I can’t say offhand, but that was the general idea.

golgoth13 · July 3, 2006, 10:48am

I seriously doubt that virtualizing and multipassing on the number of attributes or varyings is even possible for all shader
with the arguments brought up to the table… I understand why now… well, compt me as one demanding more transistors.

ideally:

count = GL_MAX_TEXTURE_IMAGE_UNITS = 16 // texture count based algo.

GL_MAX_LIGHTS = count
GL_MAX_VARYING_FLOATS = count * 2
GL_MAX_VERTEX_ATTRIBS = count // no overlaps with standard attributes on nvidia.
GL_MAX_VERTEX_UNIFORM_COMPONENTS = count * count * 2
GL_MAX_PROJECTION_STACK_DEPTH = count / 2
GL_MAX_TEXTURE_STACK_DEPTH = count
GL_MAX_COLOR_MATRIX_STACK_DEPTH: count

this will probably set the standard for one pass rendering pipeline as far as I m concern…

here is the â€œGuru Pretenderâ€ call SkeltonMesh was waiting for:

further more, unlikely to happened unfortunately but, push hw to the limits and draw the line betweenâ€¦ it is physically not possible yet and we can do it but it would be expensiveâ€¦ at least developers could work on yet to come mainstream hardware… while cooking 1-3 years projects for instance… the title could be up to date at release and years after…

I don’t know what you want me to say. I explained what I was talking about, and you seemed to get it. Whether your code was correct or not, I can’t say offhand, but that was the general idea.
Fair enough!

Thx again gentlemen!

Korval · July 3, 2006, 1:58pm

Here’s what you’re not understanding.

All of these constants:

GL_MAX_LIGHTS = count
GL_MAX_PROJECTION_STACK_DEPTH = count / 2
GL_MAX_TEXTURE_STACK_DEPTH = count
GL_MAX_COLOR_MATRIX_STACK_DEPTH: count

are meaningless. They are from driver-side stuff that you could just as easily do yourself. They are not stopping you from doing what you want.

The number of attribtes, varyings, and uniforms are the actual hardware restrictions.

system · July 3, 2006, 2:00pm

Originally posted by Korval:
[QUOTE]Well, now what? You have no way to tell if you’re executing code on a 5600 or not.
glGetString(GL_RENDERER) should return Geforce 5600 along with other substrings in there so you are technically wrong.

At least GLSL offers some things that can be queried.
One can also query for info using ARB_vp and ARB_fp. If you have the NV extensions, query those.

Those numbers give an idea on which path your engine should take.

I like those low level shaders. They offered a lot of low level info. They told you when a shader isn’t native (runs in software)
How many temps, how many parameters, environment parameters, branching depth.

golgoth13 · July 3, 2006, 2:39pm

The number of attribtes, varyings, and uniforms are the actual hardware restrictions.
got it!

humâ€¦ interestingâ€¦ custom matrix stacks is one of the keyâ€¦ still have no clue how but I ll dig into thisâ€¦ im doing an export/import plugin atm butâ€¦ I ll be back.

thx for all the great inputs!

cheers

knackered · July 3, 2006, 3:33pm

std::stack<mat4> matrixStack;

void glPushMatrix()
{
matrixStack.push(currentMatrix);
}

void glPopMatrix()
{
matrixStack.pop();
currentMatrix = matrixStack.top();
glLoadMatrixf(currentMatrix);
}

Komat · July 3, 2006, 7:27pm

Originally posted by V-man:
glGetString(GL_RENDERER) should return Geforce 5600 along with other substrings in there so you are technically wrong.

Problem is that format of that string is not standardized so it may theoretically change between drivers and is not required to contain reference to the hw type at all. You also have make to sure that you will not by parsing error include future “GeForce 56000” or exclude “GeForce 5600XT” or “GeForce 5600 Ultra” or “GeForce 5600/AGP”