Radeon ATI_vertex_transform ?

http://www.ati.com/na/pages/resource_centre/dev_rel/gdc/ATIGDC2001Vertex.pdf

The above link talks about the ATI_vertex_transform extension, but only gives half a page of a code sample and no details. It appears that ATI does NOT intend to provide the full DirectX 8 vertex shader language for OpenGL users. However, I can’t figure out how much equivalency they do plan on providing. There is no description of ATI_vertex_transform on the ATI developer resources page – can someone provide a better description of the extension?

I spoke with one of their GL-driver guys in Marlborough, MA who said they will expose full, if not further ‘vertex processing’ fucntionality soon. Like you, however, i’ve seen nothing from them on this topic so far. ATI has yet to commit resources to developer relations like other chips companies have, but they supposedly have some exciting stuff in the pipeline and the Radeon is the only 3d texture card out, so i hope they do expose their capabilities soon.

If you look at the way the GDC presentation code is structured (on page 8 of the PDF my URL points to) you’ll see that they use a register-setter-like interface instead of the DX8-like text compiler interface.

Looking a little closer at what they’re doing, I suppose you can write a piece of code which takes a NV_vertex_program text string (a k a “DX8-like vertex shader description”) and turn that into the correct sequence of calls – assuming there is enough flavors and tokens for TransformOp.

Speaking of which: is there a bug in the code snippet they posted? It seems to me that the two first TransformOp2 calls should really be TransformOp3?

Well, i think the register mode is best.
It allows one to make dynamic changes (aka mix and matching ‘vertex shaders’ on the fly) as well as tightly customize to a particular chipset. With ‘precompiled’ text, you’re still stuck dealing with run-time compiling (or NVASM binding) for dynamic vertex shaders and you’ll still need to customize for a given chipset (depending on tex units/num passes possible/etc.)

So from what i can see, ATI’s approach for vertex handling is optimal in the way that RegisterCombiners/TextureShaders are optimal for Nvidia. I’d like to see Nvidia expose a register/instruction set for their vertex programs so i can generate pieces at runtime-

OK, this is my second attempt to reply to this thread. I hope it works.

I’ll start off by saying that I can’t answer every possible question you might have. This spec is still under development. We are presently working with others under the ARB participants agreement in hopes we can ship the extension in a form that multiple other venders can support.

The sample code you are looking at is from an early version of the spec. It is included primarily to convey the concepts. It does contain a bug, but the lines you are looking at are correct. The ones below them should be TransformOp1. (oops)

Originally posted by V:
So from what i can see, ATI’s approach for vertex handling is optimal in the way that RegisterCombiners/TextureShaders are optimal for Nvidia. I’d like to see Nvidia expose a register/instruction set for their vertex programs so i can generate pieces at runtime-

I personally dont mind the register combiner method of making function calls to set parameters (nor do I mind the program method, so Im a bit indifferent). However in talking to some dev rel people from nvidia, it seems they got a number of complaints (or should we say constructive criticism?) from professional developers that they did NOT like doing things that way.

There are definitely some issues with the register combiners method of doing things.

For one, register combiners code is virtually unreadable and unmaintainable. It doesn’t help that some of the arguments are booleans rather than enums.

  • Matt

Originally posted by V:
I’d like to see Nvidia expose a register/instruction set for their vertex programs so i can generate pieces at runtime-

Umm… isn’t that EXACTLY what NV_vertex_program does?

I’ve thought this issue through, and I think that the suggested ARB version can be as powerful as the nVidia “assembler” set, and easier to map to specific hardware implementations to boot, and thus might be a preferrable way. Assuming the ARB guys do their job right, which I have no reason to believe they won’t.

In reality, if I were writing shaders using this state accumulation API, I would probably have my own little text language, or table of function pointers, to assemble the program. However, leaving this level to the programmer is fine, just like OpenGL leaves the management of mesh data to the programmer.

I also agree with Matt that the register combiner way of specifying things is hard to read. There was some sample code which took a text description and spit out the actual state definitions; again you can do this on your own and just write and debug the register combiner issue code once.

ehart, glad to see here someone in the know, and maybe with some ability to affect things. So here are two comments about vertex shading extensions:

First, I hope that there are plans to make this extension available on chips that do not support T&L in hardware, like the Rage 128, Rage Pro (/Mobility), and (not ATI, but ARB members AFAIK) Riva TNT, Intel 810. I think that for vertex shaders to really become standard, the extension needs to work on as many chips as possible. Direct3D has such software emulation.

Secondly, how about partial, per attribute vertex shaders? By this I mean replacing texture coord calculations, colour (material) calculations and vertex location calculations separately (maybe even finer granularity - replacing just fog calculations, for example). Overriding all vertex processing by a program seems to me to add unnecessary complexity. I can think of more uses for overriding just texture coords or lighting without altering vertices, or altering vertices and normals without altering material properties. Separating this functionality also allows the program to be more modular (allowing to apply the same shading effect to both predefined and shader generated geometry). Since I didn’t find an extension spec, but only the example in the presentation, I don’t know if this has been considered. I think it shouldn’t be difficult to add.

Originally posted by jwatte:
Umm… isn’t that EXACTLY what NV_vertex_program does?

hi jwatte
What i’m referencing is the difference between instruction sets that have to be compiled, such as DX8 vertex/pixel shaders & NV_Vertex_Programs versus instruction sets that are dynamically configured via states, such as NV_Register_Combiners and how the ATI_Vertex_Transform currently looks (although given the limited info on the ATI spec, that’s questionable at best).

I agree that ‘state-driven’ access is often more difficult to read natively, but most of that can be heartily cleaned up via 'namespace’or clever use of macros.

However, the real value of ‘state-driven’ mechanisms over ‘compiled’ instructions is that it is often cheaper/easier to dynamically & programmatically alter them.

For applications that have very fixed functionality, e.g. you know ahead of time that your app will only use 16 vertex/pixel shaders, using a mechanism like DX8/NV_Vert_Programs is excellent.

However, if your app has dynamically created/combined shaders, then a state-driven mechanism have a lot of dynamically combined functionality, where there is a lot of mixing and matching of shaders

In a rough/arguable way, the difference seems to me a bit like Display Lists vs Vertex Arrays. But, that’s not a very accurate comparision.

In any case, the diversity of current and upcoming chipsets will cause us to implement a variety of rendering pathways. These different coding pathways which tell the real answers to this entertaining but currently somewhat empty speculation on rendering api methods.

BTW, just wanted to comment that the first consumer level chip to implement 3D textures was 3Dlabs’ R3 / Permedia 3 (used in the Oxygen VX1 and Permedia 3 Create!). It was also the first, I think, to include dot3 bump mapping. Unfortunately, its fill rate wasn’t high enough (especially for the price) to be considered by gamers. And nobody thought these features that interesting when it was introduced.

Originally posted by ET3D:
BTW, just wanted to comment that the first consumer level chip to implement 3D textures was 3Dlabs’ R3 / Permedia 3

Cool. Wonder if it supported mip-mapping, unlike Radeon? I should’ve said Radeon is the only consumer hardware that i know of that does 3d textures-

Additional note on this thread-
I re-read ATI’s docs and they have a phrase that captures the difference between text-compiled programs and ‘state-configured’ programs, which is “Permuation Management.”

For very dynamic universes, such as where 1 minute an object has an image and shadows projected on it and the next minute it has water splashed on it and is under sparkling atmospheric effects, one really needs “Permuation Management” to configure the vertex/pixel shading for the desired visuals.

Can you point me to the ATI docs? I must be blind or something, but I couldn’t find them (except the GCD presentation linked in this thread).

> However, the real value of ‘state-driven’
> mechanisms over ‘compiled’ instructions is
> that it is often cheaper/easier to
> dynamically & programmatically alter them.

Perhaps its my tools/compilers background, but I don’t buy that. Ever heard of “sprintf()” ? :slight_smile:

The main difference between the current state-pushing API (from what we know of it) and the text-compiling API (which is well documented) is that the former allows higher-level concepts (such as “multiply by modelview matrix”) to be expressed as an atomic unit. In the text-based version, you have to emit that as a series of four DOT4 instructions. However, for most intents and purposes, they are “the same” (in that you can write a program which transforms one into the other, although the text-to-stateops program would probably be more complicated).

So, that being said, I would like for there to be a text-based representation which allowed the higher-level ops found in the state-based representation; that would get us the best of both worlds :slight_smile: Of course, with a state-based thing, creating a text-based representation of it is almost trivial…

Originally posted by ET3D:
Can you point me to the ATI docs? I must be blind or something, but I couldn’t find them (except the GCD presentation linked in this thread).

Sorry ET3D, when i mentioned ‘docs’ i was referring to the GDC presentation mentioned at the top of the thread-
http://www.ati.com/na/pages/resource_centre/dev_rel/gdc/ATIGDC2001Vertex.pdf

Originally posted by jwatte:
[b]> However, the real value of ‘state-driven’
> mechanisms over ‘compiled’ instructions is
> that it is often cheaper/easier to
> dynamically & programmatically alter them.

Perhaps its my tools/compilers background, but I don’t buy that. Ever heard of “sprintf()” ? :slight_smile:
[/b]

Hi jwatte,
Guess i didn’t make my point clear about ‘Permuation Management’ being better with ‘state-driven’ changes than with ‘compiled symbols.’ Note:: this is only in the case where the formula for object shading changes dynamically over time.

I’ll try an example using NV_Register_Combiners as a ‘state-driven’ example and DX8 pixel shaders as a ‘compiled symbols’ example.

To make this example simple, we’ll assume all the vertex processing is done in custom software and we’re just interested in the pixel processing.

If we have an object that we
decal + diffuse&specular bumpmap + 1 shadowmap, we can easily write a DX8 pixelshader and less easily (until we’ve written some decent support macros) code that in RegisterCombiners.

All set. But let’s say that on the fly we may add a second or third light which changes the number of shadows and affects specular bumps. We’ll need to write another set of shaders for 2 lights and another for 3 lights (Yes, arguably we could code a pixel_shader for 3 lights and use bogus values to simulate just 1 or 2, but that’s rather impractical).

Now let’s say that each light can project its image, in a ‘slideshow’ projector fashion. Now we need shaders for 1, 2, or 3 lights with 0, 1, 2, or 3 projectors. With reordering, that’s 12 potential shaders. As we add more possible combinations, such as blending between 2 decal or bump maps (versus just a static one), we create a lot of permuations.

It just isn’t practical to precompute and store them all, so we need another solution. Our engine stores shader fragments (‘sub-shaders’) and links/generates the complete set of shaders/passes on the fly.

Here’s the catch though:

With DX8 we have to generate the program_shader_symbols and compile them, which can cause stalls/poor_performance if you’re generating many shaders (even when caching a set of 32 MRU shaders).

With Register Combiners, we generate our shaders from linked lists of state changes and simply apply them. Not much of a performance cost here and the code to handle this is much simpler than the DX8 version as we have a rule_set that handles the quirks of Register_Combiners and linking their states.

If your 3d scene doesn’t require dynamic shading creation, then a compiled-symbols approach is great. If you do require mixing and matching shading fragments on the fly, then a state-driven works well.

Note:: the only ‘state-configurable’ set of graphics HW controls that i know of is the 'traditional fixed pipeline’and RegCombiners, whereas everything else is compiled_symbols, such as DX8 & NV_Vertex_Program. Thus the reason i was excited that maybe ATI’s approach would enable better ‘permutation management’ by being easier to generate shaders on the fly.

See Also::
NVLINK/NVASM

Final Note::
Its not the ‘state_driven’ versus ‘compiled_symbols’ issue that matters here, its simply how easy/costly it is to generate new permuations on the fly.

Sorry this is long, but please lemme know if this is clear-(or if i’m missing something

I am not talking about DX8 bytecodes. I’m talking about the NV_vertex_program extension. It compiles text to state at run-time, just like a state-ful API compiles a series of function calls to state at run-time. If you don’t see that these are mostly equivalent, I don’t know what else to say.

Originally posted by jwatte:
It compiles text to state at run-time, just like a state-ful API compiles a series of function calls to state at run-time. If you don’t see that these are mostly equivalent, I don’t know what else to say.

I agree that both approaches can acheive the same functionality. Never disputed that.

It just seems that between setting states and generating and compiling text, the compiling method has cost overhead and coding complexity for on-the-fly-code-generation that the first method lacks.

I don’t see how they are ‘equivalent’ since one requires the performance cost of allocating_shader_space/generating_id/compiling/state-setting and the other costs just state-setting. I also think that writing code to handle ‘permuations’ of shader fragments is more complex in text, but this can really be an api-design issue.

The only measurable information i have in this matter is benchmarking functionally-identical DX8 pixel shaders and RegisterCombiners on a GeForce3. The time spent in ‘creating/binding’ the pixel settings is 6~15 times higher for the compiled approach. Clearly, this is an arbitrary comparison, but it does bias my appreciation of state-set shading over text-compiled shading.

I am pleased that you started this thread as i think there is a lot to consider regarding api approaches.

Whether compiled & binded text, state-set functions, or some other approach, ease of coding and performance costs are going to matter.

With 20~40 shaders per frame, the pros & cons of shading methodology should be with the ease of coding. With 100s of shaders per frame, the performance of generating shaders starts to matter as well.

You may think that 100s of shader permutations per frame is unrealistic, but it is needed to increase scene complexity.

yes, I fully agree with V on this. I have
exactly the same problem (generating
shaders on the fly) and the overhead
of compiling them through DXAssembleShader
(sorry,DX specific) is pretty bad for the
more complicated ones…

On the other hand, I so much prefer
using vertex/pixel shader instructions to
a series of cryptic state changing
API calls (like register combiners ext)…

I don’t see how they are ‘equivalent’ since one requires the performance cost of allocating_shader_space/generating_id/compiling/state-setting and the other costs just state-setting.

just state-setting, huh?

Ever wondered what actually goes into “just” setting state? Hint: the state needs to go somewhere :slight_smile:

If compiling a textual representation of some state changes is appreciably slower than performing the same number of state changes as function calls, something is wrong in the text compiler. You really shouldn’t underestimate the importance of quality of implementation, though, I guess.

So, as I said: I like an interface where there’s higher-level concepts (“apply projection matrix”). I like those concepts to be applied by text, if possible. If not, I’ll just do my own text->state routine, although that will cause many calls into the driver instead of one. Under DirectX, driver calls are Really Expensive ™; under OpenGL, somewhat less so, but still perceptible.