March'04 meeting notes

Originally posted by Madoc:
I cannot understand why anyone would ever want to use a high level shading language. I wouldn’t say it’s really any easier than VP and FP, you can still write quick and dirty code with VP and FP. It seems like a completely useless and meaningless abstraction, it doesn’t have any of the real advantages of a high level language. I am honestly puzzled by the motivation behind this. I am also quite ignorant due to lack of interest, perhaps someone can enlighten me?
The advantage of high-level shading languages is that the driver has alot more freedom and ease in optmization than with low-level code. This takes away some of the need for the shader programmer to optmize to each vendor.

Originally posted by valoh:
Well, do you use x86 or the corresponding ISA on your platform for software development?

Yes. Of course. Particularly for something performance critical such as what’s done for every vertex.
That said, VP and FP are hardly assembly level now, innit? They are specialised low level languages but far from assembly, and quite notably so in terms of ease of use.

Originally posted by Adruab:
but future hardware will support dynamic branching (nvidia’s 6800 support dynamic loop branching at the very least).

Yes, that’s the only thing I can come up with as a valid point myself, future developments in programmable hardware. It still seems to me that the priority in developing these languages was not from a practical point of view. How long will it be before a decent slice of the market will benefit noticeably from this?

Originally posted by Adruab:
There are plenty of interviews with people saying how an optimized version of their high level shader came out to one instruction where their version of optimized low level came out to 1 instruction less and took them 7x as long.

I find this hard to believe. Who said this? Someone that can’t be assumed to be a really bad programmer?
What I see is that when you implement something in a low level language you usually discover a completely different formulation for your problem which is far more efficient when implemented thus. I have some longish “vertex shaders” that I have implemented in C, x86 assembler and VP. Each implementation is quite different in what it does but they all arrive at the same result. I know no compiler could do as much to rearrange the problems so efficiently for the available instructions. And you definitely see the difference between that and a naive conversion.

I find it generally hard to believe (if not comical) that a high level language is intended to better optimise code. When has this ever been the case?
So long as the low level languages are sufficiently close to the hardware there are logical optimisations that only the programmer will be able to do. VPs and FPs still give the drivers a fair amount of freedom for other optimisations.

Of course, VP and FP could actually have little to do with the actual hardware and then everything I’ve said is moot and arguments for high level languages are all good. I wouldn’t be too surprised, I’ve seen enough BAD design decisions.

Originally posted by Madoc:
Yes, that’s the only thing I can come up with as a valid point myself, future developments in programmable hardware. It still seems to me that the priority in developing these languages was not from a practical point of view. How long will it be before a decent slice of the market will benefit noticeably from this?

Considering that this language (or at least the framework) is designed to last TEN YEARS I would think this is a valid point. And with next gen GUI’s using hardware acceleration (ie OSX and Longhorn) a “newish” and fast 3D card MAY become standard so sales reps can show off eyecandy.

[quote]Originally posted by Adruab:
There are plenty of interviews with people saying how an optimized version of their high level shader came out to one instruction where their version of optimized low level came out to 1 instruction less and took them 7x as long.

I find this hard to believe. Who said this? Someone that can’t be assumed to be a really bad programmer?
[/QUOTE]Don’t quote me on this but I think it was the Half life 2 guys.

What I see is that when you implement something in a low level language you usually discover a completely different formulation for your problem which is far more efficient when implemented thus. I have some longish “vertex shaders” that I have implemented in C, x86 assembler and VP. Each implementation is quite different in what it does but they all arrive at the same result. I know no compiler could do as much to rearrange the problems so efficiently for the available instructions. And you definitely see the difference between that and a naive conversion.

I find it generally hard to believe (if not comical) that a high level language is intended to better optimise code. When has this ever been the case?
So long as the low level languages are sufficiently close to the hardware there are logical optimisations that only the programmer will be able to do. VPs and FPs still give the drivers a fair amount of freedom for other optimisations.

Of course, VP and FP could actually have little to do with the actual hardware and then everything I’ve said is moot and arguments for high level languages are all good. I wouldn’t be too surprised, I’ve seen enough BAD design decisions.[/QB]
This is exactly the point, the ASM level interfaces only map “roughly” to the Existing hardware. (from Nvidia and ATI - I sure 3Dlabs have plenty to say about this) Even then, you have lots of little “rules” for each vendor. (ie with ATI you have swizzling and texture dependency issues but have possible bonuses of co-issue instructions. With Nvidia, you can actually go faster when code is re-arranged and you execute more instructions)

Also keep in mind that once feature becomes “core” in OpenGL it is going to be there for a very long time -so it had better be generic enough to handle future developments.

On your last point, you say that it is a BAD design decision to not map the ASM interfaces directly to hardware, If you were to write a ASM interface, whose hardware would you map it to? ATI? Nvidia? 3Dlabs? These are not CPU’s where each vendor is implementing the same opcodes in hardware. (this would only work if one vendor got 95%+ of the market which would make their opcodes a standard for competitors to implement. ie think Intel which allowed AMD to come along later)

Originally posted by sqrt[-1]:
On your last point, you say that it is a BAD design decision to not map the ASM interfaces directly to hardware, If you were to write a ASM interface, whose hardware would you map it to? ATI? Nvidia? 3Dlabs? These are not CPU’s where each vendor is implementing the same opcodes in hardware. (this would only work if one vendor got 95%+ of the market which would make their opcodes a standard for competitors to implement. ie think Intel which allowed AMD to come along later)
Huh, no, not directly. I hope they’re relatively close though. After all the languages are not that low level, the instructions are very much alike to what a vector library might do and my understanding is that the GFX hardware operates with similar functions.
I was thinking mostly of the awful mess of all the multitexture extensions, which IMO made a really lousy job of exposing HW functionality.

I cannot understand why anyone would ever want to use a high level shading language. I wouldn’t say it’s really any easier than VP and FP, you can still write quick and dirty code with VP and FP. It seems like a completely useless and meaningless abstraction, it doesn’t have any of the real advantages of a high level language. I am honestly puzzled by the motivation behind this. I am also quite ignorant due to lack of interest, perhaps someone can enlighten me?
With low level languages:

-There is no portable way of using all the instructions of every hardware.

-Long shaders are difficult to write.

-When new hardware appears your old shaders will not use any of the new functionality because of your old code.

With high level languages:

-The compiler is free to optimize the code using all the power of the present and future graphics hardware, and It usually does it better than most of low level programmers.

-Shaders are easier to write (and shaders will become long programs someday).

And, to finish, some words from John Carmack:

 At that point, a higher level graphics API will finally make good sense. There is debate over exactly what it is going to look like, but the model will be like C. Just like any CPU can compile any C program (with various levels of efficiency), any graphics card past this point will be able to run any shader. Some hardware vendors are a bit concerned about this, because bullet point features that you have that the other guy doesn't are a major marketing feature, but the direction is a technical inevitability. They will just have to compete on price and performance. Oh, darn.

That seems to make sense and it’s very pretty but doesn’t meet any practical concern of mine. I am most concerned with how old hardware will run my code. New hardware will just do it better. Whatever I’m aiming at today, I don’t need to worry about how tomorrow’s hardware will run it.

If I rewrite something in assembler and it’s 3 times as fast as the C implementation then I know I’ve done something useful and the chances of future hardware and compilers beating that are both slim and utterly irrelevant.

I also have to repeat that these languages are not that low level. I don’t see them as providing such a barrier for compilers and they’re also essentialy vector and math operations which haven’t changed in centuries. It’s not like we’re dealing with the subtleties of register stacks and instruction parallelism of a given processor architecture (I won’t argue about “register” limitations in today’s hardware, that’s just about quantities and depends on the program, not the language).

These high level languages still seem like a bit of a marketing orientated thing to me. I still haven’t heard anything said that gives me any practical reason to want to use them. I guess ease of use and familiarity might be a point for some people, fair enough.

I guess it might also depend on the application. In my exporience, really long shaders are too slow for any realistic RT application these days (I’m excluding demos). My really long shaders are not really intended for real-time (or applications that care about compatibility at all), though they’re still VP/FP and they still want to be as fast as possible. I guess future hardware might run them in real time, I don’t care, we’ll be doing different stuff then.

It just seems a bit early to talk about a complete move, IMO.

Originally posted by Madoc:
If I rewrite something in assembler and it’s 3 times as fast as the C implementation then I know I’ve done something useful and the chances of future hardware and compilers beating that are both slim and utterly irrelevant.
Have you tried making such a comparison? Write a long shader in both ARB_fp and GLSL and compare the performance.

I will, though right now I have other things to get on with. I was however speaking about the general case of low vs high level languages, by no means am I expecting similar results from shader languages.
It’s silly really, VP and FP have nothing to do with assembler. I would actually hesitate to even call them low level (err… what was the actual definition again?), it’s only the syntax that’s similar.

I don’t want to sound obstinately contrary to HLSLs, I keep wanting to agree but I can’t convince myself. I still don’t get what all the big hype is about.

Since day 1 I’ve been complaining about the limitations and rigidity of shader languages in dealing with small variations in state. I’ve seen high level languages appear but not a solution to the more fundamental problems behind today’s programmable pipes. Fixed function is all about adjusting state for shaders, lights etc. and it works efficiently, with programmable shaders each little variation means a new shader and the number of shaders would grow exponenetially. In practice we end up with less flexible and/or less efficient renderers. Sure, we have more flexibility in what shaders we can design but inflexibility in the interaction between states.

I think that one of the big optimizations that can come from writing CPU assembler code by hand rather than trusting the compiler is in avoiding memory access. By keeping everything in registers as much as possible, one avoids any potential bandwidth and latency issues that may slow performance. That’s not the case in shader languages, since the only memory access there is comes from texture reads and framebuffer writes. Temporary storage in a shader program can only be within registers. As such, this is one optimization that cannot be taken advantage of.

This leaves instruction counts and architecture matching (which includes register counts, on relevant hardware). I remember the early days of D3D’s HLSL where the resulting assembler shader code was much more efficient in terms of instruction counts, but lately the compiler has gotten very good indeed, the difference coming down to within one or two instructions. I’m sure that GLSL implementations will achieve similar results, eventually. Actually, GLSL will do better, since it’s possible to do specific architecture matching.

with programmable shaders each little variation means a new shader and the number of shaders would grow exponenetially.
Have you not heard of uniforms? Have you not yet discovered that you can have conditional branching (in vertex shaders, at least)?

The days when you needed to make a new shader for adding/removing various features are over. You can make one monolithic vertex shader and just pass various uniforms that deal with flow control. Granted, you have to accept the performance penalty in doing so.

Also, I would like to point out that glslang’s shader linking facility allows you to build little modules of shader code that you can link together as needed (in theory, at runtime, though nVidia’s compiler doesn’t really allow for that). So, at least there, you have some possibility for dynamic shader construction.

I see Madoc’s point. For performance reasons, one would need quite a few shaders to implement a general pipe; the branching support/performance just isn’t there right now. Sure, you can theoretically make it happen, but it’s going to be slow, ergo essentially useless.

Think of implementing the 2000+ shaders in Quake3 in GLSL, for example. Brrrr, shiver me timbers. This shader model requires a scripted wrapper. You could then pick an optimal path based on the contents of the shader, and so on. But it would be impractical to implement this in VP/FP directly, there are simply too many permutations. I think that the shader model requirements for performance, quantity, and artist facility will ultimately determine the method used.

As for the question of high level shading laguages; well, I wouldn’t want to go back ASM in my project. The benefits of high level laguages transcend mere convenience into the realms of time and creativity.

the branching support/performance just isn’t there right now. Sure, you can theoretically make it happen, but it’s going to be slow, ergo essentially useless.
I know this is the case for fragment programs, but what about vertex programs (people still use those, right)? What is the performance cost for looping there?

@Korval
I was mainly referring to FPs, but also the general problem of configuring a single general pipe for thousands of shaders; this is difficult with VPs as well. Although, CG’s ‘interface’ is an intriguing possibility.

My ideal configuration would require only a single VP and FP for the entire pipe, but this really isn’t practical in current hardware. Suppose one could parse a single VP/FP into its constituent parts, replacing if/else with separate programs. The problem would still be intractable with a large number of configuration possibilities. Again, I’m referring mainly to FPs.

Wow, lots has happened so far. I somewhat agree with the speed benefit for asm GPU programming.

At the moment, since GPU programs must be under a certain number of instructions and a certain number of registers it does make more sense to be able to program manually within those boundaries. High level scripts can evaluate to the exact same as assembled files would be assuming all the normal options are available (which they are not at the moment in GLSL… read nvidia half type). Either way, writing in a super generic fasion just doesn’t work right now given the common range of cards out there (hence MS’s effect files for multiple versions…).

In the end I think it’s a mistake to not upkeep the assembly versions in GL (I’m not sure if this is the case or not). However, OpenGL is supposed to be forward looking, and this is what they are trying to do. At the moment, DX’s system is easier for the current and past hardware. The more complicated GPU hardware gets, the better the high level languages will perform. In my opinion, an interesting option would be the multipass compiler (like Ashli), though it seems like there would need to be a lot of information about how to compile it to make it very successful in practical circumstances.

FPs are where most of the serious shader work is, in my experience. Per-vertex/per-pixel hybrids have horrible artifacts that often defeat the purpose of doing anything per-pixel. I end up doing little in VP if I use FP for anything much.

I’m not sure what a good solution might be. I would like to see more modular programs but I haven’t thought it through well enough to propose an actual API solution.
Being able to switch a program module that does fog, for instance, would be useful. Also being able to globally use one method or another without having to repeat shaders for none, method1, method2 etc. I see fog and the fog method as something that is part of the environment, not the shader (though they obviously can and should interact to some extent).
I would even dare go as far having distinct shader/light-interaction code fragments that can be interchanged depending on the light type and even allow enabling and disabling of multiple lights.

I actually thought about implementing a system that could programmatically generate shaders from code snipplets to deal with all the cases as they were required. It seemed that it could work well though I don’t recall the details too well now. I should imagine that a more dynamic driver-side such system should be quite possible. Possibly it would be a difficult API to design well.

Back on the vertex/pixel shader hybrid thing, you can actually shift a lot of work to the vertex shader as special case optimisations. ie. known viewer or light distance allow more approximate methods, avoiding renormalisation, interpolating vectors etc. This would also be useful as a lightweight per-object state switch that doesn’t require trillions of different shaders. It could certainly make a drastic difference performance wise, these are very valuable optimisations.

Since you mentioned an API design for different snippets of code and such… Microsoft seems to be coming up with something like that with native support in Longhorn ( Window's Graphics Foundation ).

Not only that but Unreal Technology 3 says they will support dynamic recompilation of changing parts of shaders (e.g. varying types and number of lights through recompiling the shader, or at least that is my impression).

There are also similar implementations now for simple cases. An example in windows dev conf. presentations comes to mind, for having an array of compiled shaders based on numbers, like numbers of bones and lights (at run time you just index into the array to get the correct version with out doing any dynamic branching on the hardware).

You won’t be able to release GLSL code cause drivers for ATI and NV are still beta, so you should stick to VP, FP for now, and plan on GLSL.

If you will be working with GLSL, you have to keep in mind the hw limits (don’t do looping, don’t do branching on ATI, …)

Shading is in it’s infancy, but after a couple of generation (~2 years), the hw will be fancy enough so you can code more freely, without worrying about incapable GPUs.

The NV 6800 is pretty powerful as is … living proof! Seen the videos?

I saw the Unreal 3 tech presentation just after making that post. I thought it might be common practice when I saw that. I do some very simple switching of code snipplets myself but I currently don’t go beyond basic material terms and textures. I’d like something really flexible and a whole lot of optimised paths but there’s really quite a lot to it, you also need to start handling register declarations and usage, for example, it’s not just patching code together.

I’m very impressed with those Unreal 3 shots, I thought it would be a little while still before we saw that kind of quality in games. I wonder what the performance implication of that kind of HDR and HDR glow are. Looks great. I also wonder how much of that stuff is a hack, ie. the tinted soft shadows.

and I wonder how much time has gone into the art, when the source art is reaching film quality with millions of polys per model, art generation will be taking a while for those next gen games.

The tinted soft shadows needn’t be a hack. It’s a generic textured light issue. You could even potentially render the projective image (a cubemap in that case) tinted through arbitrary (and even moving) transparent surfaces modeled in 3D and make it work. Once you handle light correctly the need for hacks diminishes significantly IMHO. The thing I’m most curious about is handling shaders correctly in the context of a robust lighting solution. You can’t just implement any old shader, key terms must be modulated by shadow terms and others moved into passes that aren’t fragment culled, making some kind of restricted shader framework inevitable. Doom 3 has the monolithic shader with preset lighting terms, Unreal 3 seems to move a bit beyond that with some programmabiliy, however that needs to coexist elegantly with your lighting. Half-life 2 seems to take the approach of allowing a selection from several available terms. The iridescence shader in the Unreal 3 demo with lighting was intriguing and I wonder how much of a hack that was, was someone able to write some kind of thin film component refraction shader and apply it to an environment map term easily for example or was it painstakingly implemented to coexist with existing effects but reimplementing a totally different shader path?