ARB June Meeting

al_bob · July 29, 2003, 8:38pm

On standard assembly this would go to a MUL then a MAD instruction (and perhaps a temporary register) . However in register combiners you could have done the whole line in one instruction.

You underestimate the people writing drivers. After all, MUL / MAD can be used to build a syntax tree (or whatever it’s called - I’m no compiler expert), which can then be optimized to whatever else the hardware supports.

Pop_N_Fresh1 · July 29, 2003, 9:32pm

You underestimate the people writing drivers. After all, MUL / MAD can be used to build a syntax tree (or whatever it’s called - I’m no compiler expert), which can then be optimized to whatever else the hardware supports.
So the “assembly language” is actually going to parsed, a syntax tree created and then code generated from the tree. ie: either way you’re putting a full compiler into driver.

That being the case, what does the “assembly language” get you other than the ability to write MUL TEMP, a, b; MAD frag.out, c, d, TEMP; instead of out.frag = (a * b) + (c * d) ?

davepermen · July 29, 2003, 9:46pm

Originally posted by al_bob:
[b]You and others keep repeating this. Yet no one can produce a single example where this is irrevocably true.

use cg and you’ll see yet quite some losses…

use a to-x86 compiler and run it on a p4 and you see quite some losses

if the assemblylanguage in the end is just another representation of the c-style language, its useless, as you can as well compile the c-style language if it has to get compiled in the drivers anyways.

if its not, then it means performance losses.

there are those two ways:

compile to a general asm (like x86), and directly run that. that means all hw will be limited to x86 only implementations. p4 shows how that can limit performance

convert to a general assemblerstyle, or even binary representation of the c-code, and send that to the driver. compile there to get best performance.

the first one is stupid, i guess everyone agrees.
the second one is debatable, but i would prefer a binary intermediate format then. it gains that you can add other language-to-glbin compilers, and stuff…
not that i like that you can have tons of languages to do the same task. it means people will use tons of languages, depending on their likes, and that means you have to be able to read them all, too… much more work for everyone.

the current idea is to use the c-style language as the representation of the shader, and optimize from this directly to the best.

and i think everyone agrees that we should have only one optimisation/compilation step. from shader code to shader.

its not actually important in wich language the shader is stored, c,asm,binary, then…

i’d go for binary then. like java…

al_bob · July 29, 2003, 10:04pm

the second one is debatable, but i would prefer a binary intermediate format then. it gains that you can add other language-to-glbin compilers, and stuff…
not that i like that you can have tons of languages to do the same task. it means people will use tons of languages, depending on their likes, and that means you have to be able to read them all, too… much more work for everyone.

How is that “glbin” different from the assembly language? It has all the same limitations and problems as assembly.

The only real differences are:

“glbin” is already parsed. No need to do that in the driver.
“glbin” isn’t directly writable to.

Whether the second one or not is an advantage is debatable (although I like the idea of exposing this in text form).

The first one though is what some of the people here are worried about. Ever wrote an assembly-language parser? It’s much much easier and faster than a C parser, simply because the syntax is compeltely dumbed down.

use cg and you’ll see yet quite some losses…

I’m confused now. How would you know how to optimize for the given card, given that the information isn’t readily available? How is the Cg compiler mature? Is there an HLSL compiler that you can name that will compile to native code so that you can compare to Cg?

and i think everyone agrees that we should have only one optimisation/compilation step. from shader code to shader.

C compilers have multiple optimization stages (at the front-end and the back-end, at least; Some have more). Why should it be different for shaders?

kansler · July 29, 2003, 11:36pm

Originally posted by Zengar:
What I mean - access to the assembly should be granted openly, so that anyone could write his own HLSL. I think it’s a big mistake to make a HL language to the core.

Microsöft Visual Fragment++ anyone?

Korval · July 29, 2003, 11:54pm

use a to-x86 compiler and run it on a p4 and you see quite some losses

Compared to what? An equivalently-clocked P3? The P4 will eat it up because it has a larger front-side bus. The losses due to the CPU will be hidden because the app is more memory bound than anything else.

Compare it to an equivalently-clocked AthlonXP? No contest, the XP will win. Of course, Athlons have always won against Pentium-class processors, clock-for-clock that is. You can compile the code specifically for a P4, and the XP will still smoke it.

The second point, however, is irrelevant (though still true). The first point is the important one. New hardware doesn’t just change how the internals of the shader processor works. The entire thing gets faster. Whatever speed losses, compared to optimized code, exist can be covered up by the speed gains from mere brute force.

And, of course, that assumes that the compiler for this new GPU lacks the information necessary to compile the ISA code optimially, which you have still provided no proof of. Give me one example, just one example, of some fundamental structure in C that must be retained in order for optimization to work properly.

On standard assembly this would go to a MUL then a MAD instruction (and perhaps a temporary register) . However in register combiners you could have done the whole line in one instruction. So unless you want to add a ton of specific “weird” instructions to the assembly spec, you could never optimize for all these cases without allowing the driver access to the high level language.

The solution would be a compiler that was smart enough to regocnize this expression in the assembly, and optimize it appropriately. If the compiler isn’t smart enough, then nVidia clearly hasn’t done their job correctly.

You’d have to do the same in a C compiler. Only, it’s a lot harder to parse.

So the “assembly language” is actually going to parsed, a syntax tree created and then code generated from the tree. ie: either way you’re putting a full compiler into driver.

We already have that. ATi’s ARB_fp drivers have to build a dependency graph for texture accesses just to compile it’s shaders. And 3DLabs’s hardware is scaler based; it doesn’t look anything like those opcodes.

What we want is a language that is low level enough that compilers for different high-level languages can be written. We don’t mind optimizations being done in the drivers.

convert to a general assemblerstyle, or even binary representation of the c-code, and send that to the driver. compile there to get best performance.

Which is precisely what we are discussing. Welcome to the conversation. Glad you could make it

it means people will use tons of languages, depending on their likes, and that means you have to be able to read them all, too… much more work for everyone.

I tend to prefer the freedom of choosing a shading language.

Maybe Pixar would like to take all those Renderman shaders they have and put them into some kind of hardware form. Of course, in order to do it with glslang, they’d have to, by hand, walk through all their shader code and transcribe it.

However, with the paradigm we’re proposing, all they need to do is make a compiler for their shaders.

To be fair, they could write a compiler to go to glslang. But, making a compiler from something like C to something like C is a pain. An assembly-like language is much nicer in terms of transcription.

Also, what if I see certain defencies in glslang? For example, for the kinds of things I want to do (piecing together bits of shaders to build up a full shader), I’d like to have header files. Of course, glslang doesn’t allow that (at least, not without lots of string copying and splicing). I’d like a shading language that does. Or, I’d like to augment the (public) glslang compiler with a #include directive.

I can’t do any of that with glslang as the last stop until hardware. The best way to do this is with a simpler intermediate language that is easy to compile to.

and i think everyone agrees that we should have only one optimisation/compilation step. from shader code to shader.

Why? I’ve got no problems with multi-step processes, especially when one of them is a pre-process step that makes a runtime step that much faster.

A non-trivial amount of optimization can be done at the glslang-to-ISA compiler level. Dead-code can be eliminated. Basic, fundamental optimizations can be made here that would not need to carry over into the driver’s compiler. In that way, all the driver’s compiler needs to worry about is converting the ISA assembly into hardware opcodes, and doing hardware-specific optimizations on them. An assembly parser is much easier than a C parser. Easier means more bug-free. And bug-free is good.

i’d go for binary then. like java

A binary format makes it harder to write compilers (especially for machines with different endian-ness). Also, it makes it a bit more difficult to extend the language.

Is there an HLSL compiler that you can name that will compile to native code so that you can compare to Cg?

To be fair, there is.

All Radeon 9500+ drivers come with a preliminary implementation of glslang (the extension strings aren’t exposed, but you can get the entrypoints). I can’t tell you how well it works, but it is there. I imagine that ATi intends to expose these extensions directly once 1.5 is fully approved.

Mazy · July 30, 2003, 12:13am

Well, they expose the old GL2 proposed extension, so its not up to date with the new ARB one in the current (3.6) catalyst drivers, but its still working. Haven’t tested the speed, just played with it to make sure that it fits my engine.

I guess we have to wait and see how this turns out. I think i saw small hints about a revision in the ARB_*_programs in the latest meeting notes, so you might still use anoter compiler, and afaik, CG announced that they should be able to ‘compile’ to GLslang sooner or later ( i might be wrong there ).

davepermen · July 30, 2003, 12:33am

Originally posted by Korval:
[b]Compared to what? An equivalently-clocked P3? The P4 will eat it up because it has a larger front-side bus. The losses due to the CPU will be hidden because the app is more memory bound than anything else.

[/b]

compared to compiling with a directly p4 optimizing compiler that can use more than only x86 asm, and can actually know how the hw behind works. for example the intel c++ compiler wich gains up to 5x speed increase over vc6 in software rendering apps here…

you have one serious issue: an assembler does not OPTIMIZE. what you want in the driver is a compiler. that means a translator,interpreter,and optimizer.

we generally take the “write in assembly” as “write and feed directly that way into hw WITHOUT FURTHER CHANGES”.

the assembly you want is simply a highlevel shading language unrelated to hw, wich looks like good old assembly, more or less. for most people, a c styled language unrelated to hw is much more convencient, thats why cg is here, thats why hlsl is here, thats why cgslang will be here.

haven’t ever seen a compiler for real assembly before. only for pseudoassembly, like for example ARB_fp and ARB_vp… they have compilers ("shader optimizers, cheaters… what ever ") in the background…

i have no problem in having asm as high level shading language in opengl. its just not hip.

and if you say you want assembler, everyone means you want a one-to-one-mapping of your code to hardware.

secnuop · July 30, 2003, 6:44am

Originally posted by Korval:
What we want is a language that is low level enough that compilers for different high-level languages can be written. We don’t mind optimizations being done in the drivers.

Let me turn some of the questions you’ve been asking around. Is there anything you can do in an assembler-style intermediate language that you cannot do in a higher level language like glslang? What precludes you from compiling your favorite high-level language into a trivial glslang main function with a simple list of expressions?

As an example, I’d bet it wouldn’t be very difficult to write a glslang backend for Cg. I can’t say for certain since I’ve never written a Cg backend, but I don’t see any obvious technical obstacles.

In fact, I recall from programming language courses (way back when) that this is how some other non-shading languages were originally developed. Their compilers emitted c source code as an “intermediate language”, which was then compiled to the native CPU format by an established c compiler.

Tom_Nuydens · July 30, 2003, 7:58am

We should postpone this discussion until all video cards have the exact same feature set. People here have drawn comparisons with how CPUs are programmed, but in doing so they overlook one crucial fact: a Pentium4 can perform every computation an Athlon can do and vice versa. The optimal way to get to the result may be different on both CPUs, but you know that both of them can get there.

This is far from true for Radeons and GeForces. The biggest concern right now should not be finding the optimal path for each card, it should be finding a path (preferably one that doesn’t end up with software emulation). Until all cards have a “complete” feature set, I believe IHVs will continue to expose custom programming interfaces, be it assembly or HLSL, and the only way to get properly optimized shaders will be to use these proprietary interfaces instead of the standardized ones.

– Tom

[This message has been edited by Tom Nuydens (edited 07-30-2003).]

Korval · July 30, 2003, 9:07am

you have one serious issue: an assembler does not OPTIMIZE.

We’re not discussing an “assembler”. As you point out later, modern assembly shader languages are compiled, not assembled.

Is there anything you can do in an assembler-style intermediate language that you cannot do in a higher level language like glslang?

You mean, besides the reasons I’ve already given?

From Korval:

Freedom of high-level language. We aren’t bound to glslang. If we, for whatever reason, don’t like it, we can use alternatives.

Ability to write in the assembly itself.

And let’s not forget the notion that writing an optimizing C compiler is a non-trivial task. Neither is writing an optimizing assembler of the kind we are refering to, but it is easier than a full-fledged C-compilier. Easier means easier to debug, less buggy implementations, etc. And, because there will then only need to be one glslang compiler, all implementations can share that code.

I can’t say for certain since I’ve never written a Cg backend, but I don’t see any obvious technical obstacles.

You mean, besides the whole, “Outputting to a C-like language rather than something simpler,” problem?

The biggest concern right now should not be finding the optimal path for each card, it should be finding a path (preferably one that doesn’t end up with software emulation).

Um, no. I refuse to accept a non-optimal solution. Whatever solution is picked should allow for the production of an optimal shader for the hardware in question. Performance is still the #1 priority; without it, you can’t afford to write longer shaders. Maybe you don’t work in a performance-based realm, but that doesn’t mean that performance isn’t vital. It’s just not vital for you.

Until all cards have a “complete” feature set, I believe IHVs will continue to expose custom programming interfaces, be it assembly or HLSL, and the only way to get properly optimized shaders will be to use these proprietary interfaces instead of the standardized ones.

That’s rediculous.

I’ve never doubted that the glslang in-driver compiler would produce optimal results. It is in every card manufacturer’s best interest (if they support the language) to produce optimal results; that’s why Intel’s compiler works so well.

Glslang will get there. However, all I’m saying is that glslang isn’t the only way to do it; there are other approaches that can still get the performance, but are lower-level, so that high-level compilers can easily be written to compile to them. That way, we aren’t bound to glslang.

Currently, most cards that could even consider supporting glslang can’t support certain features in hardware (texture access in vertex shaders, etc). So, it’s only a matter of finding which subset of functionality makes cards run reasonably fast.

You’re not going to see an ARB_vertex_program_2 or ARB_fragment_program_2, ever. ATi’s throwing everything behind glslang; that’s how they plan to expose functionality.

[This message has been edited by Korval (edited 07-30-2003).]

roffe · July 30, 2003, 9:17am

FYI,

talked to a 3DLabs glslang compiler writer/developer at SIGGRAPH yesterday. He more or less said this(not exact quote): “It seems that some people like to have an assembler like language to fool around with, to do some tweaking. How often do you look at your cpu generated code? We’re putting a lot of effort into making the compiler as good as possible, so you can concentrate on high-level parts of your algorithm/shader”.

EDIT: Duh,my question was something like:
Will 3DLabs be releasing any performance docs/tips on shader coding? As an example I brought up NVIDIA’s register-usage issues.

[This message has been edited by roffe (edited 07-30-2003).]

t0y · July 30, 2003, 9:50am

Korval:

quote:The fact is that today's hardware interface is very, very similar to arb_fp.

Oh really? I wouldn’t be too surprised if the 9500+ series wasn’t just running suped-up versions of ATI_fragment_shader-based hardware. The dependency chain is what makes me think of this. If you made an ATi_fragment_shader implementation with 4 passes and lots of opcodes/texture ops per pass, you could implement ARB_fp on top of that kind of hardware, as long as you could build a dependency chain.

And even if they’re not, how can you be so sure that the hardware looks anything like ARB_fp?

That’s not exactly what I meant. I meant that arb_fp is not exactly independent of the underlying hardware of both nvidia and Ati latest gpu’s. It looks to me as a hack to make them “compatible”. But that’s just me. As I said I have no experience with programmable hardware (I have an R100 )
I’m sure the 9500+'s internal ISA is very similar to the extensions it provides, but that just backs up my claims that you can still use CG or your own high-level language in any case. The optimal interface is there!

quote:I don't like the idea of having different code for arb_fp1 arb_fp1.1 arb_fp2k5.

quote:A new extension for each generation! But isn't that what's supposed to change? How future proof is that?

I hope you don’t expect glslang 1.0 to be the final version. If you do, you’ll be sorely disappointed.

Whatever shading language the driver uses will change over time (probably about once per generation). Whether it happens to be glslang or an ISA of some form doesn’t matter.

It will change a lot more if you use ASM. As long as it’s the hardware-specific extensions changing, we can (as drivers permit) rely on glslang to make the necessary optimizations/changes. Just like CG (should)!

quote:If speed is what matters most, just use the hardware-specific extensions. If we want shaders that work across generations optimally (compiler dependent of course), a high level language is a good solution. Both forward and backward compatability is important.

Why is “forward compatibility” important? As long as old shaders work on new hardware, and work faster than they did on older hardware, then you’re getting all the functionality you need.

I don’t get it… You want the best performance out of your shaders or not? if not, if you just want them to “work”, then you should be ok… otherwise… well, you should change your opinion.

Also, as I pointed out before, you have failed to give a reason as to why a lower-level solution could not be compiled into just as optimized a solution as the higher-level one. All you’ve done is simply state that this is the case; that doesn’t make it true.

You don’t realize you’re talking about a high-level assembly language. Something that has to be “translated” to the native assembly. Anyway I prefer writing “var_a=var_b*var_c;” than using a bunch of movs and temp regs in x86 letting the compiler figure out the optimal code.

[stupid example]
Next thing you’ll be proposing a new instruction for “e=sqrt( sqr(ab+c)+(cd)+c+d);” like “do_a_sqrt_of_the_sqr_of… r1,r2,r3,r4,r5,r6…”. I know you could decompose into several instructions, but isn’t it obvious that you should preferably use the high-level syntax?
[/stupid example]

al_bob · July 30, 2003, 10:51am

Next thing you’ll be proposing a new instruction for “e=sqrt( sqr(ab+c)+(cd)+c+d);” like “do_a_sqrt_of_the_sqr_of… r1,r2,r3,r4,r5,r6…”. I know you could decompose into several instructions, but isn’t it obvious that you should preferably use the high-level syntax?

If the hardware exposes such an instruction, it’s up to the compiler in the driver to convert the equivalent serie of opcodes to that instruction. Please, read the posts above for details.

I don’t get it… You want the best performance out of your shaders or not? if not, if you just want them to “work”, then you should be ok… otherwise… well, you should change your opinion.

You want your shaders to run optimally on current hardware. But you also want them to run faster on future hardware. It’s up the NV/ATI/whoever to make them run optimally (or not) on future hardware. After all, who knows what new hardware will have in capabilities (appart from the people here who work for those companies, of course)? Besides, it’s in those companies’ interest to make things run optimally on their hardware.

compared to compiling with a directly p4 optimizing compiler that can use more than only x86 asm, and can actually know how the hw behind works. for example the intel c++ compiler wich gains up to 5x speed increase over vc6 in software rendering apps here.

Of course, you’re not comparing Intel C to assembly, or VC6 to assembly, or Intel C to VectorC (or any another vectorizing compiler). You’re comparing Intel C from 2003 to VC6 from 1998. This “5x” figure is completely meaningless, and, as far as I’m concerned, completely bogus. Vectorizing compilers are good, but not that good.

Of course, Athlons have always won against Pentium-class processors, clock-for-clock that is.

Yes and no. I can trivially write a loop that will run 2x faster (clock for clock) on a P4 than on an Athlon XP. I can also trivially write a loop that will run faster on an Athlon 800 MHz than on a Pentium 4 3.2 GHz. It’s all about knowing what the CPUs are good at (or not).
That said, the information necessary to convert one code sequence optimal to one chip to the optimal code for another is not lost in the assembly.

If the compiler isn’t smart enough, then nVidia clearly hasn’t done their job correctly.

nVidia also doesn’t have unlimited resources to throw at the problem.

Tom_Nuydens · July 30, 2003, 11:20am

Originally posted by Korval:
Um, no. I refuse to accept a non-optimal solution. Whatever solution is picked should allow for the production of an optimal shader for the hardware in question. Performance is still the #1 priority; without it, you can’t afford to write longer shaders.

I was thinking of NVidia’s floating-point precision woes. The only way to get an NV30 to perform at its best is to exploit the fact that you can lower shader precision where possible. Nor ARB_fp nor GLslang allow you to do this: all they can give you is a shader that makes optimal use of a subset of the hardware. If you want to make optimal use of the full HW capabilities, your only choice is a proprietary interface.

Of course I don’t expect this particular problem to persist beyond the current HW generation, but lord knows what’s in store for us when the next generation comes around

– Tom

secnuop · July 30, 2003, 12:48pm

“Is there anything you can do in an assembler-style intermediate language that you cannot do in a higher level language like glslang?”

You mean, besides the reasons I’ve already given?

Freedom of high-level language. We aren’t bound to glslang. If we, for whatever reason, don’t like it, we can use alternatives.

Ability to write in the assembly itself.

You’ve given me some reasons why you prefer writing in an assembler-like intermediate language, but I’m still unconvinced that there’s anything you can do in an assembler-style intermediate language that you can’t do in a high level language like glslang.

As one obvious example, the assembler-like statement:
“ADD r0, r1, c0”
can be trivially converted to the glslang statement:
“r0 = r1 + c0”

Or:
“DP4 r0, c0, v0”
becomes:
“r0 = dot(c0, v0)”

In the end, both the assembler-style program and the trivial glslang conversion should produce the same hardware instruction sequence. So what do you gain if your alternative language frontend producs ARB_vp/fp sourcecode as its intermediate language that you don’t get if your alternative language frontend produces glslang sourcecode as its intermediate language? What advantage do you get by having the “ability to write in the assembly itself”?

zeckensack · July 30, 2003, 2:07pm

Originally posted by secnuop:
What advantage do you get by having the “ability to write in the assembly itself”?

If I may answer, it’s compile time. The first conversion could be done off line and thus save work on the user’s machine.

Korval,
believe it or not, I think now we’re in agreement about what an intermediate language should be (if we need one). My apologies if you meant all that from the onset, we could have spared us a lot of the trouble then.

So, to recap: an intermediate code representation must
a)encode all constructs of the high level language (eg we need a ‘sine’ encoding vs a Taylor series)
b)preserve all type information
c)preserve all instruction flow information (functions, loops, branches)
d)preserve lifetime information (variable scopes) or put all scoped temporaries into a flat temp var space. Can’t say for sure, I think I’d prefer the latter.

These are the requirements for not destroying any information. The first processing step can also do
a)syntax error checking
b)semantics error checking (outputs aren’t written to, that kind of stuff)
c)flow analysis and dead code removal
d)constant folding

There are also a few things this first step should not do, most prominently register allocation and scheduling.

The speed gains by doing that sort of stuff off line aren’t too promising. After all, you’re representing everything in some sort of byte code instead of strings but string parsing is hardly slow enough to make this matter.

The processing steps required for this sort of thing are thin and light
You’re IMO not saving much work. With the exception of flow analysis. And this will have to be repeated in the ‘second pass’ if this is still the one that does the optimizing work.
If you leave it out of the first pass (and forfeit dead code analysis), you’re doing very close to nothing.

I don’t think this will produce any appreciable execution time savings. I hope I’m clear now.

But of course you have a second point, allowing other layered front ends to access an intermediate code interface. And if I got your idea of intermediate code right this time, I won’t argue about that.

Korval · July 30, 2003, 2:10pm

We’re putting a lot of effort into making the compiler as good as possible, so you can concentrate on high-level parts of your algorithm/shader

Since I don’t plan on buying 3DLabs hardware anytime in the near future, I don’t care what their compiler can do. I’m interested in what ATi’s compiler can do, and what nVidia’s compiler (if they even deign to make one) can do.

Will 3DLabs be releasing any performance docs/tips on shader coding? As an example I brought up NVIDIA’s register-usage issues.

The whole point of a high-level language is that you let someone else deal with the hardware-based performance issues. Clearly, 3D Labs has no knowledge of how to optimize for GeForceFX cards; only nVidia knows that. And in either case, you shouldn’t have to do anything special for them.

It will change a lot more if you use ASM.

Based on what do you say this? Do you have some knowledge of a feature you expect to see in the near future?

You want the best performance out of your shaders or not? if not, if you just want them to “work”, then you should be ok… otherwise… well, you should change your opinion.

Shaders writen, and compiled (to the ISA), on older hardware should work faster than they did on the new hardware. If they don’t work as fast as they could if they were re-compiled (with new opcodes), then fine. As long as the brute-force method on old hardware is faster, everything should be fine.

Anyway I prefer writing “var_a=var_b*var_c;” than using a bunch of movs and temp regs in x86 letting the compiler figure out the optimal code.

You don’t have to write in the ISA; there would be (off-line) compilers to turn glslang, or Cg, or whatever, into the ISA. You never have to touch the assembly.

The only way to get an NV30 to perform at its best is to exploit the fact that you can lower shader precision where possible. Nor ARB_fp nor GLslang allow you to do this: all they can give you is a shader that makes optimal use of a subset of the hardware. If you want to make optimal use of the full HW capabilities, your only choice is a proprietary interface.

True, but that’s a “failing” of glslang and ARB_fp. The languages aren’t rich enough to specify precision information. And the default precision they require is too great for 16-bit floats.

So what do you gain if your alternative language frontend producs ARB_vp/fp sourcecode as its intermediate language that you don’t get if your alternative language frontend produces glslang sourcecode as its intermediate language?

Well, for one, you don’t have to go through the pain of writing out glslang rather than assembly. Writing assembly is, comparitively, easy. Writing correctly-formed glslang (with conditionals, code blocks, etc) is much harder.

Also, C/C++ compilers these days expect code to be written in a certain way. They compile code expecting that a human wrote it. So, you wouldn’t see things like:

int iTemp1 = A * B;
int iTemp2 = C * D;
E = iTemp1 + iTemp2;

So they may not optimize it correctly if your compiler spits out that kind of glslang code. They would expect this kind of code for an assembly-based language. Certainly, they wouldn’t prioritize this case for optimization over other, more likely, cases.

What advantage do you get by having the “ability to write in the assembly itself”?

Really short shaders (Like Result = tex(TexCoord0)) can just as easily, and much faster, be written in the assembly rather than in glslang, which involves various visual overhead (if nothing else, putting it in a function).

dbugger · July 30, 2003, 10:42pm

I don’t believe glslang will be the final solution. To have a standardized high-level language is a Good Thing; it will save us all lot’s and lot’s of work. An assembler language that can be executed on all gfx-hardware is vital however, and I believe new ARB extensions will provide that. My point is, glslang will co-exist with lot’s of other high-level languages (like Cg) but be the standard for high level gfx programs.

[Edit] - Perhabs it will even be possible to write asm blocks in glslang just like in C.

[This message has been edited by dbugger (edited 07-31-2003).]

t0y · July 31, 2003, 1:24am

al_bob:

quote:Next thing you’ll be proposing a new instruction for “e=sqrt( sqr(ab+c)+(cd)+c+d);” like “do_a_sqrt_of_the_sqr_of… r1,r2,r3,r4,r5,r6…”. I know you could decompose into several instructions, but isn’t it obvious that you should preferably use the high-level syntax?

If the hardware exposes such an instruction, it’s up to the compiler in the driver to convert the equivalent serie of opcodes to that instruction. Please, read the posts above for details.

I meant a all-new opcode in the intermediate asm. In this case, all backward compatability breaks apart.

If you still prefer to write the sequence of instructions needed to mantain compatability (hoping for the compiler to optimize) instead of writing in higher-level then our opinions are different.

My idea of perfection ( ) is to have a varyable number of direct hardware interfaces, and a an independent language to use them transparently. Both cg (Offline) and glslang(online) are able to accomplish this. I just don’t see the need for another limited general ASM between them. The main disadvantage of CG is of course the compile time needed, but we’re not talking about pages and pages of shader code, are we?

Korval:

quote:It will change a lot more if you use ASM.

Based on what do you say this? Do you have some knowledge of a feature you expect to see in the near future?

The more abstract (higher-level) a language is the easier it is to make it fit the harware restrictions and features. It’s exactly because I don’t have any knowledge of future features that I prefer glslang over “your” intermediate ASM. How many changes have you seen made to C/C++ to make it work in a particular architecture? Are you able to say that it has lost any kind of optimization abilities? ASM imposes restrictions on how you do things!

quote:You want the best performance out of your shaders or not? if not, if you just want them to “work”, then you should be ok… otherwise… well, you should change your opinion.

Shaders writen, and compiled (to the ISA), on older hardware should work faster than they did on the new hardware. If they don’t work as fast as they could if they were re-compiled (with new opcodes), then fine. As long as the brute-force method on old hardware is faster, everything should be fine.

What ISA? the itermediate or the lowlevel? Anyway, if you let that re-compile up to the driver this problem goes away.

quote:So what do you gain if your alternative language frontend producs ARB_vp/fp sourcecode as its intermediate language that you don’t get if your alternative language frontend produces glslang sourcecode as its intermediate language?

Well, for one, you don’t have to go through the pain of writing out glslang rather than assembly. Writing assembly is, comparitively, easy. Writing correctly-formed glslang (with conditionals, code blocks, etc) is much harder.

??? Are you really sure about this? I thought high-level languages were created to ease the life of the programmer. Does CG have any reason to exist after all?

quote:What advantage do you get by having the “ability to write in the assembly itself”?

Really short shaders (Like Result = tex(TexCoord0)) can just as easily, and much faster, be written in the assembly rather than in glslang, which involves various visual overhead (if nothing else, putting it in a function).

What’s wrong with glBindTexture(tex)?

Your argumentation fails in the case of complex shaders in which case, the plethora of asm instructions needed will result in severe visual and algorithmic overhead (lack of simplicity and clarity).