Official OpenGL mechanism for vertex/pixel "shaders"

Further, on the subject of programmability in GPUs, I think that its obviously that someday the benefits of having a fully programmable general-purpose GPU will be clear, and the hardware manufacturers will start producing them. I think the cost in speed this would incur has not been outweighed by the benefits in flexibility. I assume that someday we will just see something like the removal of the limitation on number of operations in vertex programs and pixel shaders.

In the long run, though, it seems to me that once we have fully programmable GPUs, limiting them to just graphics will seem ludicrous. What if you could encode your physics algorithms as a “vertex program” and allow the card to process your physics for you? If we ever see floating-point framebuffers and a fully generalized pipeline, then I think that GPUs will become applicable to far more diverse problems than just graphics. At that point, though, it makes sense to move towards a new computer architecture in which a scalar CPU and a massively SIMD GPU work together as multiprocessors, with shared and exclusive memory for each. Programs would consist of machine code for both instruction sets designed to work in tandem. Or maybe this kind of massively parralel computation will become part of standard CPUs so that we will move back to the days when the CPU was responsible for all graphics work.

My logic for this is that as GPUs become more flexible we will want to apply them to more problems, and the CPU<=>GPU bus will continue to be the limitng factor, until the only suitable solution is to bring the two closer together…

I agree.

It does seem that pretty soon GPU’s will be very similar to a massively SIMD CPU with a couple special instructions added in.

Sort of sets the stage for a return to assembler programming, until compilers catch up with this.

If it happens, of course.

j

Originally posted by j:
[b]I agree.

It does seem that pretty soon GPU’s will be very similar to a massively SIMD CPU with a couple special instructions added in.

Sort of sets the stage for a return to assembler programming, until compilers catch up with this.

If it happens, of course.

j[/b]

This is not something new. Don’t you remember the TIGA graphics boards with the Texas Instrument 34010 chip (10 years ago)? It was a 2D graphics chips with an assembler and a C compiler. I made a special purpose CAD program for map engineering using it and it was funny and very powerful. I remember an example that comes with the compiler that allows you to draw triangles and other example using it that draws a non-textured 3D environment in real time at 1024x768 (it was 10 years ago so it was very impressive)

Omnibus reply follows…

I continue to insist, as before, that the whole “2D image quality” thing is something that I know very little about. I will repeat my previous statement:

*** begin quote from previous message*
I’m not saying you’re wrong about 2D image quality, but what I’m saying is that I’ve heard so many contradictory claims on the subject that I don’t know who to believe any more.
*** end quote from previous message*

For lists of features, our extensions document does document the set of extensions supported on each chip.

I’d also question the link to the ATI page you posted as being little more than dressed-up marketing material. The images in that document are all straight out of Radeon marketing slides/whitepapers. The document glosses over all sorts of inconvenient details and even makes misleading or false statements in a number of places. I could provide a large list of examples from just that document alone. If you want the real information (or an approximation thereof), again, you have to go to their developer pages.

On the IP issues: Yes, perhaps we’re being overly cautious, but in this incredibly cutthroat industry, we have no choice. And as Mark said, it’s standard practice, and we also have offered to license the extension on what we think are reasonable terms.

It’s massively oversimplifying DX8 to claim that it reduces everything into vertex buffers, vertex shaders, and pixel shaders. First of all, I could just as well claim that in the future, OpenGL is going to be all about vertex arrays, vertex programs, and register combiners. The two statements are, in fact, almost equivalent! Secondly, both of them ignore the fact that there is still a lot of API functionality outside those areas. Pixel shaders/register combiners totally ignore the backend fragment operations. The viewport transform, primitive assembly, clipping, and rasterization and interpolation are still quite alive. And so on.

In terms of future system architectures, I think the XBox is a good example of where things are headed. The XBox has three main chips: CPU, GPU, and MCP, with a unified memory architecture – all the memory hangs off the GPU, so there’s no dedicated system memory, video memory, or audio memory. The GPU and MCP are optimized for specific functions, while the CPU handles “everything else”.

If you make the GPU too programmable, it becomes nothing more than a CPU. So programmability must be limited.

  • Matt

Don’t you remember the TIGA graphics boards with the Texas Instrument 34010 chip (10 years ago)? It was a 2D graphics chips with an assembler and a C compiler.

No, I don’t remember that. Maybe that’s because I didn’t even own a computer then.

If you make the GPU too programmable, it becomes nothing more than a CPU. So programmability must be limited.

I wasn’t saying that I think a GPU should do what a CPU does.
What I am saying is that with seeemingly everybody asking for more and more flexibility, the assembly language used in the GPU pixel shaders might end up being similar to a CPU assembly language in the types of instructions they support.

Correct me if I’m wrong, but register combiners seem to be like the way people programmed a long time ago, on some of the first computers. Take the inputs, choose one of a couple available operations on them, and then output them; repeat for however many combiners you have. It’s true that you have all the input scaling and biasing options, but you only have about 5 or 6 basic operations.

I’m not saying that I think register combiners are worthless, or that you can’t do anything with them. And I don’t think that the pixel pipeline should be completely user programmed.

But I do think that substituting a user writeable script in for the portion of the pipeline where the combiners are now could give an amazing amount of flexibility. Something like vertex programs.

j

The combiners are essentially a VLIW instruction set, where you have to do scheduling yourself in your app. For example, you can schedule two RGB multiplies or dot products to occur in parallel by putting one in AB and one in CD. You can also schedule scalar and vector ops by putting the vectors in RGB and scalars in alpha. So in total, the engine can perform 2 vector and 2 scalar ops “per cycle” (for a very loose definition of per cycle; don’t try to read too far into my use of this term). The final combiner adds some extra power and complexity to the mix.

We could have used an interface similar to LoadProgramNV and written an instruction scheduler inside our driver, but when there are only 2 general combiner stages and the single final combiner, you could write a 3-instruction or 4-instruction program that we could have failed to schedule, yet on the other hand you could write a >10-instruction program that would schedule into those combiners.

At some point in the future, I would like to abstract away the combiners a bit. But at the time, it was the right interface, and I think it will still be the right interface for quite a while longer.

In the meantime, you could write an app-level library that would do this kind of scheduling.

  • Matt

yeah, combiners are pretty neat stuff.
i think it’s time for me to actually write a demo where the combiners are doing a light pass, ala` no lightmap on actual level geomtry instead of standard/extened primitives…
see how that works out.

i wonder if i could pull of some neat toon rendering.
probably be quicker for high mesh stuff.
right now my current render just uses the sipmple intel technique… http://www.angelfire.com/ab3/nobody/toonmodel1.jpg

but my light sources are messed ;/

dunno know, probably will try them both out (levelgeometry and cartoon) with combiner stages.

laterz,
akbar A.

>At this point taking advantage of vertex arrays, multitexture, complex blend modes and shading operations, and programmable per-vertex math leaves one writing code >consisting only of extensions, it seems

another thing to note is.
these extensions and the extra code passes are very well worth it.

there is a big enough jump on the nvidia and ati cards, where if you don’t support them your really missing out.

honestly, if your just in it to make or ship a game you don’t have to worry, cause there still a few years till games will ‘start’ using some of the cooler features…
BUT, if you want to make cutting edge stuff, this is the only way…

example;
jason mitchell of ati was tellign me a lot of the developers really shy out when it comes to supporting some of the more compilicated/non trivial code passes…

laterz,
akbar A.

>we do feel that the API chosen by DX8 and
>in NV_vertex_program is the right API for
>this functionality.

i remeber cass was plannign to organize a chat for the feature right after they came out on the opengladvanced list (on egroups).
but i don’t think anyone got around to it…

are there any more papers/demos available beside the 75 page spec and the ppt?

laterz,
akbar A.

Matt wrote:

ATI also doesn’t advertise their [volume texture] trilinear restriction very openly… I believe they use two texture units to do trilinear, so you can do 1 trilinear and 1 bilinear but not 2 trilinear, and if this is correct, it actually constitutes a small cheat on their part.

False. This is not correct for Radeon, nor was it correct on the Rage128, back when TNT was doing such a cheat.

Bringing you your daily dose of FUD

Please don’t. There’s enough to sort through.

-JasonM at ATI

Don’t be afraid by my bad English, I’m French !

I think that most programmers are waiting for tools that provide an API for advanced features. Why use opengl if we redo the engineering work from the bottom: we’d better use DX8. That’s the question.

Y think that for a experimented programmer it should be easy to write an all in one extension, on top of opengl which provide similar features than DX8 ones. The goal of such tool is not to offer a function like open_3ds_and_display_this but bring_me_the_mathematics_and_knowledge_that_i_havent_time_to_spend_for.
A such function cannot provide the most specific of each HW, but should be a right way to not have a big difference between 200 lines DX8 demos that provide per pixel shading with bump mapping and advanced vertex shading, and 5000 lines ogl one wich can only display basic rendering and contains huge mathematical concepts.

In other words, when will nVidia will provide the OpenGL SDK ?

We are a very small company. So, having a programmer learning for 6 month the right manner to reach the best effects on nVidia cards is too expensive. In few years, we should have the money to spend time for learn the core architecture of the HW, but at this time we prefer to reach the simplest way : jumping to DX8. What about independent programmers and students ?

So, the priority of a vendor seems to provide simplifications layer that permit to start with the most efficient feature (bump, per pixel, simple vertex shading) in very short time.

No ?

My opinion is not to said to nVidia and others to build a 3D engine, but to encapsulate her advanced knowledge in a library of specific functions and extensions.

Gabriel RABHI / Z-OXYDE / France

Originally posted by gaby:
200 lines DX8 demos that provide per pixel shading with bump mapping and advanced vertex shading, and 5000 lines ogl one wich can only display basic rendering and contains huge mathematical concepts.

I think it’s pretty clear, if you look at MS’s DirectX samples, that it’s usually the exact opposite. MS often does 5000-line samples to do what could be done in 200 lines in OGL (and probably 500 lines with D3D by a reasonably efficient coder).

  • Matt

Just a little contribution to this overheated thread (Can someone douse that fire on the icon ?).

It is true that developping extension specific code in OpenGL is a bit of a drag but, lets be honest, even if it seems like DirectX 8.0 has all the pixel and vertex shading capabilities in standard, its only hardware accelerated on a very small number of video chips on the market. Anyone seriously developping an app with DirectX 8.0 will first check to see if the hardware supports the feature and if not will use his own custom software alternative. Said differently, in OpenGL you check if the extension exists before using it whereas with DirectX you do it the other way around : is the feature accelerated ? In the end it turns out to be the same… except that you can expect with DirectX that in the long run every video card on the market will fully support its features. But who knows if the NV (or any card vendor for that matter) specific extensions will be supported by all or even turn out ARB (lets not even speak of being standard OpenGL) ??

>honest, even if it seems like DirectX 8.0
>has all the pixel and vertex shading
>capabilities in standard, its only hardware >accelerated on a very small number of video >chips on the market

exactly.
that is why we should all use opengl extensions.
see this for more detail. www.angelfire.com/ab3/nobody/geforce.txt

with d3d we have to wait for the cycle, that sucks.

laterz,
akbar A.