ARB June Meeting

You’ve just given the archetypical example, where you can more than quadruple (float) throughput by not using assembly. In fact, […] If you use the same high level code you should have used ten years ago to begin with, on an up to date compiler, you win.

Let’s ignore for a second the change in the example you used. We’re comparing x87 to SSE 1/2 now.

Ignoring precision issues, yes, you can compile C floating-point code into SSE. But you can do just as well on assembly (x87) code! This (unfortunately) isn’t a problem that can only be solved by a high-level language - the methods that work with C work equally well with x86 assembly.

As Korval pointed out:

What is it about C that allows for these optimizations transparently that an assembly language would not allow for? Also, why is it that these facilities that allow for the transparent optimizations cannot be given to the assembly as well as a C-like system? Remember, the assembly doesn’t have to closely resemble the final hardware data; it can have facilities that don’t look much like common assembly.

That’s an optimizing compiler. A second one

Yes it is. I see only one on-line optimizing compiler though (the one that matters). The other (if present) is off-line. I don’t see what the problem is. This is done every day by modern CPUs, in hardware. It could also be done in hardware, on GPUs, but typically isn’t.

Why does a middle layer need to be defined and exposed?

Your conserns have already been addressed; I shall repeat them here:

  • You get your pick of HLSL. If Cg is your thing, then by all means use it. If you don’t like Cg, then use whatever else you like.
  • If none of those suit you, you still have access to the low-level assembly, so you can write your own code, or write your own HLSL. Call it zeckensackG or something, which may or may not be the same as t0yG.

The conversion to any intermediate “this isn’t the real thing anyway”-language is completely devoid of any merit.

Perhaps you should explain that to the nice people who write GCC. After all, they’re facing similar problems to what the ARB is: their high-level compiler needs to work on all these different platforms. As to not duplicate most of the optimizer, the platform-independent optimizations are performed on the C code, which is then converted to an intermediate-level assembly language. That intermediate assembly is then converted (and optimized) to the platform-specific code.
And btw, GCC, on integer code, does not suck.

“Traditional” software is distributed precompiled because of several issues I don’t even want to enumerate here, because none of them apply to shader code.

On the contrary - most of the issues deal with IP and/or user-interfacing. There are no real technical issues why “traditional” software isn’t distributed in source-form. In fact, some Linux distributions (Gentoo) install themselves by downloading source code fom the internet and compiling it specifically for your platform.

We are discussing the future… Do you have facts from the future?

No, but I’m not the one making factual claims either. I’m providing evidence that the ISA approach can work as good, if not better, than the glslang approach.

Things changed alot since the 8086 and, as you know, most code from that days won’t run properly in today’s systems and vice-versa.

They won’t run on today’s OS’s, or maybe motherboards, or other hardware. However, the fundamental machine language itself can be executed on a P4 just as well as a 286 (assuming that 32-bit extensions or other instruction-set extensions aren’t in use).

If you hand x87 assembly nicely scheduled for a 486DX to a P4, you’ll lose.

Define “lose”. To me, a loss would be, “It runs slower than it did before.” A win would be, “It runs faster.”

Now, for the Intel x86 architecture case, this may be correct, because the processor is not allowed to do things like re-order large sequences of opcodes. It can do some out-of-order processing, but not to the level a compiler can.

In the case of this proposal for GPU’s, driver writers get the entire program to compile. Where the P4 can’t produce optimal instructions simply because it can’t help but work with what it’s got, the driver can compile it and do whatever re-ordering is required.

And, even so, let’s say that hardware 2 years from now running assembly compiled from a high-level language written today doesn’t perform as fast as it would if the high-level language were compiled directly. So? As long as it is faster than it was before (and it should still be, on brute force of the new hardware alone), then everything should be fine.

The conversion to any intermediate “this isn’t the real thing anyway”-language is completely devoid of any merit.

Unless you don’t want to be a slave to glslang, that is. If you, say, want to have options as to which high-level language to use, OpenGL is clearly not the place to be. No, for that, you should use Direct3D.

Maybe, for whatever reason, I like Cg more. Maybe, for whatever reason, I don’t like any of the high-level languages and I want to write my own compiler. Or, maybe my 2-line shader doesn’t need a high-level language, and I want to just write it in assembler.

The fact that you are happy with glslang does not preclude anyone else from not liking it, or wanting an alternative.

Java may benefit from this approach because the size of distributed code is a concern. Java also pays a very real performance penalty for it.

That’s not entirely true. JIT compilers, these days, can get native Java (anything that’s not windowed) to get pretty close to optimized-C. 80-95% or so. And these are for large programs, far more complicated than any shader will ever be.

And Java doesn’t use bytecode to shrink the size of the distribution. It uses bytecode becase:

  1. They believe, as many do, that the idea of having people compile a program they downloaded is assanine and a waste of time.

  2. They want to be able to hide their source code.

  3. All bytecode is is an assembly language that the Java interpreter understands. They needed a cross-platform post-compiled form of code. The solution is some form of bytecode.

Why do we need to define and expose any sort of middle interface and layer an external compiler on top of that? Where are the benefits vs a monolithic compiler straight from high level to the metal?

You must mean, of course, besides the reasons I have given twice and ‘al_bob’ gave once?

And let’s not forget the notion that writing an optimizing C compiler is a non-trivial task. Neither is writing an optimizing assembler of the kind we are refering to, but it is easier than a full-fledged C-compilier. Easier means easier to debug, less buggy implementations, etc. And, because there will then only need to be one glslang compiler, all implementations can share that code.

Also, one more thing. nVidia is widely known as the company that set the standard on OpenGL implementations. They were the ones who first really started using extensions to make GL more powerful (VAR, RC, vertex programs, etc). Granted, Id Software didn’t really give them a choice, but they didn’t make nVidia expose those powerful extensions. I doubt there are any games that even use VAR, and even register combiners aren’t in frequent use, though 2 generations of hardware support them. Yet, nVidia still goes on to advance the cause of OpenGL.

nVidia has made no bones about not being happy with the current state of glslang. Now, they can’t really go against OpenGL overtly (by dropping support), because too many games out there use it (Quake-engine based games, mostly). But, they don’t have to be as nice about exposing functionality anymore. Or about having a relatively bug-free implementation. As long as those bugs don’t show up on actual games (just using features that real game developers use), it doesn’t hurt nVidia.

Also, they can choose not to provide support for glslang at all, even if it goes into the core. They can’t call it a Gl 1.6 implementation, but they can lie and call it nearly 1.6. Even Id Software can’t afford to ignore all nVidia hardware; they’d be forced to code to nVidia-specific paths. And by them doing so, they would be legitimizing those paths, thus guarenteeing their acceptance.

Rather than risk this kind of split in the core (where you have the core functionality that a good portion of the marketshare supports, and functionality that a good portion doesn’t. This isn’t good for OpenGL), the optimal solution would have been the compromise we’re suggesting here. There would be a glslang, but it wouldn’t live in drivers. It would compile to an open extension defining an assembly-esque language that would be compiled into native instructions.

That way, you can have a glslang that the ARB can control, but you don’t force all OpenGL users to use it.

Granted, the reason the ARB didn’t go that way was not some notion of, “putting glslang into drivers is the ‘right thing’.” No, it’s there because it hurts Cg, and therefore nVidia. ATi and 3DLabs have a stake in hurting things that are in nVidia’s interests. Killing the ability for Cg to be used on OpenGL in a cross-platform fashion is just the kind of thing that they would like to do to nVidia. And, certainly, using the glslang syntax over the Cg one (even though neither offers addition features over the other) was yet another thing ATi and 3DLabs wanted to do to hurt Cg; it makes it more difficult for Cg to be “compiled” into glslang.

[This message has been edited by Korval (edited 07-28-2003).]

Originally posted by al_bob:
[b]Ignoring precision issues, yes, you can compile C floating-point code into SSE. But you can do just as well on assembly (x87) code! This (unfortunately) isn’t a problem that can only be solved by a high-level language - the methods that work with C work equally well with x86 assembly.

As Korval pointed out:
“What is it about C that allows for these optimizations transparently that an assembly language would not allow for? Also, why is it that these facilities that allow for the transparent optimizations cannot be given to the assembly as well as a C-like system? Remember, the assembly doesn’t have to closely resemble the final hardware data; it can have facilities that don’t look much like common assembly.”[/b]
Again.
The issue is that it’s not monolithic. You need to define and expose the middle interface because otherwise you couldn’t layer a compiler on top of it.
As soon as you expose it, you need to keep supporting the defined rules because users can go there. You need to duplicate the parsing and error checking already done in the front end. You risk destroying semantics. You risk underexposing resources for the sake of cross-vendor compatibility.

Originally posted by al_bob:
Yes it is. I see only one on-line optimizing compiler though (the one that matters). The other (if present) is off-line. I don’t see what the problem is. This is done every day by modern CPUs, in hardware.
No, it’s not done by modern CPUs. A CPU can’t transform x87 code to SSE code because it is obliged to follow the defined operation of the ISA. x87 has different rounding modes, exception handling, flags and register space. You may reuse the same execution units (Athlon XP), or you may not (P4), but the code will never be executed like SSE code would be executed. The public ISA nails 'em down. (High level) source code doesn’t.

It could also be done in hardware, on GPUs, but typically isn’t.
Okay.

Your conserns have already been addressed; I shall repeat them here:

  • You get your pick of HLSL. If Cg is your thing, then by all means use it. If you don’t like Cg, then use whatever else you like.

As you believe code reformatting is such a fun thing to do, and you also seem to appreciate layered compiler models, maybe you could just as easily write a Cg to GLslang converter. May I suggest an offline preprocessing step?

  • If none of those suit you, you still have access to the low-level assembly, so you can write your own code, or write your own HLSL. Call it zeckensackG or something, which may or may not be the same as t0yG.
    No, I won’t.
    “Extension mess”
    “Multiple codepaths”
    “deprecated”
    “waste of energy”
    Any of these terms ring familiar?

Perhaps you should explain that to the nice people who write GCC. After all, they’re facing similar problems to what the ARB is: their high-level compiler needs to work on all these different platforms. As to not duplicate most of the optimizer, the platform-independent optimizations are performed on the C code, which is then converted to an intermediate-level assembly language. That intermediate assembly is then converted (and optimized) to the platform-specific code.
Yeah. I think I’ve already covered that by saying that GCC’s internal code representation is not exposed to users of GCC and is therefore free to evolve.

And btw, GCC, on integer code, does not suck.
It certainly doesn’t. I occasionally use GCC myself and appreciate every last bit of work that went into it. You see, it’s a monolithic application that can turn a portable high level programming language into reasonably efficient machine code, quite a feat.

GCC supposedly doesn’t fare too well in SpecCPU though.

On the contrary - most of the issues deal with IP and/or user-interfacing.
And these apply to shader code? Well, go ahead, encrypt your shaders, but don’t forget encrypting your textures and sound files, too. I find this idea rather irritating, but I won’t stop you.

There are no real technical issues why “traditional” software isn’t distributed in source-form. In fact, some Linux distributions (Gentoo) install themselves by downloading source code fom the internet and compiling it specifically for your platform.
Absolutely. If only shaders could work this way, too … (we could leave the internet downloading part out).

You all seem to argue about something else than how high level the shader language should be. The real issue is exaqctly how high level the language should be not if it will be assembly like or look like c.

Let me explain: Calling what you (eventually) will send to OpenGL an assembly language is largely a misnomer, simply because it won’t just be “assembled” if current hw is anything to go by, it’ll be compiled. This means you won’t have the simplicity of a direct mapping to hardware instructions. So for this compilation step to work as good as possible (after all, we don’t have to support old legacy code so there’s no need to be backwards compatible with anything) we need to provide as much hints as possible to the driver. ARB_fp obviously doesn’t provide enough hints as it is. Nvidia would like more info about precision than a monolithic hint and they also like register use to be very low, while ATI doesn’t care much about register use but have difficulty getting their analysis of falsely dependant lookups to work correctly. There are even more differences like native sin vs. polynomial approximation vs texture lookup that only will be solved well if the driver knows that you want the sine and not something else.

All this considered I think it’s pretty obvious that the language needs to be high level (in the same sense C is high level). ARB_fp isn’t high level enough as it is, so any new language needs to work out those kinks first. I think glslang is low level enough for most graphics related work, all the languages I would want are much higher level, a general material system or an image processing language for example.

There’s nothing wrong with having a low level language interface as long as it retains code semantics and is high level enough for diverse hardware to execute it efficiently. GLslang isn’t very high level by this standard, and that’s a good thing.

The question then becomes how much compiling we want to do in the driver. Even with ARB_fp the IHVs seem
to do lots of optimization in the driver so we already have this to an extent.

Of course all the string parsing required to compile something as complex as glslang might be overkill. Defining a intermediate bytecode that retains the semantics of the original code might be a good idea if it reduces work for the driver and enables faster linking. I have a feeling that we’ll be doing lots of cut and pasting and/or linking of small shader fragments to get a runnable shader in a the future. This is the only way to get light effects shaders to work with a general material system and custom shaders for example (more like renderman shaders). If the current glslang approach leads to too much overhead then we should of course move to some sort of bytecode that can be linked faster. I doubt that it will need to be very much lower level than the current glslang however.

Originally posted by al_bob:
That’s exactly what the drivers do! Do you truly believe that NV30 and R300’s native assembly language is ARB_fp? Surely not! They could run Java bytecode for all you know.

Really? If they were running java then shaders could be a lot more complex and we’re underusing the hardware! You are just saying that arb_fp limits a gpu running bytecode and that’s exactly what I want to show you…

The fact is that today’s hardware interface is very, very similar to arb_fp. But you you know this won’t last. I don’t like the idea of having different code for arb_fp1 arb_fp1.1 arb_fp2k5. It’s enough to please us now but in the long run it will be almost the same as having hardware-specific asm.

Originally posted by Korval:
quote:We are discussing the future… Do you have facts from the future?

No, but I’m not the one making factual claims either. I’m providing evidence that the ISA approach can work as good, if not better, than the glslang approach.

So, your “evidence” is not factual. Seems like a new trend nowadays…

quote:Things changed alot since the 8086 and, as you know, most code from that days won't run properly in today's systems and vice-versa.

They won’t run on today’s OS’s, or maybe motherboards, or other hardware. However, the fundamental machine language itself can be executed on a P4 just as well as a 286 (assuming that 32-bit extensions or other instruction-set extensions aren’t in use).

But of course! A new extension for each generation! But isn’t that what’s supposed to change? How future proof is that?

Alternatively, if our C/C++ code does 32bit math, you know that it’ll work in all platforms with varying intruction sets. You could code for 8087 even if you dind’t have one, remember? It’s a question of the flexibility of the processor.

And, even so, let’s say that hardware 2 years from now running assembly compiled from a high-level language written today doesn’t perform as fast as it would if the high-level language were compiled directly. So? As long as it is faster than it was before (and it
should still be, on brute force of the new hardware alone), then everything should be fine.

If speed is what matters most, just use the hardware-specific extensions. If we want shaders that work across generations optimally (compiler dependent of course), a high level language is a good solution. Both forward and backward compatability is important. What I don’t get here is that Cg was supposed to work this way and you are not agreeing with me.

Also, they can choose not to provide support for glslang at all, even if it goes into the core. They can’t call it a Gl 1.6 implementation, but they can lie and call it nearly 1.6. Even Id Software can’t afford to ignore all nVidia hardware; they’d be forced to code to nVidia-specific paths. And by them doing so, they would be legitimizing those paths, thus guarenteeing their acceptance.

You are forgetting the huge userbase ATi is getting recently. This was true in the early days when nvidia ruled and other vendors were “ignored”. This is easily getting to an ATi vs nVidia flame war when we should be talking about openGL.

Granted, the reason the ARB didn’t go that way was not some notion of, “putting glslang into drivers is the ‘right thing’.” No, it’s there because it hurts Cg, and therefore nVidia. ATi and 3DLabs have a stake in hurting things that are in nVidia’s interests. Killing the ability for Cg to be used on OpenGL in a cross-platform fashion is just the kind of thing that they would like to do to nVidia. And, certainly, using the glslang syntax over the Cg one (even though neither offers addition features over the other) was yet another thing ATi and 3DLabs wanted to do to hurt Cg; it makes it more difficult for Cg to be “compiled” into glslang.

Now you’re getting paranoid. glslang was a work in progress in the ARB for a while, and CG is an independent project. It was nVidia who chose this path, not the ARB, us, ATi, 3DLabs or whatever other players in this game.

If they lose it, it’s their own fault.

All this considered I think it’s pretty obvious that the language needs to be high level (in the same sense C is high level).

Do you have any actual basis for this claim? While we’re both in agreement that ARB_fp doesn’t cut it, that doesn’t preclude something that looks similar to ARB_fp doing the job.

Defining a intermediate bytecode that retains the semantics of the original code might be a good idea if it reduces work for the driver and enables faster linking.

The question, one that the members of the ARB are probably best suited to answer, is how much of the semantics are absolutely required to get good code. Really, does specifying an expression like this:

D = a*b + dot(r + p, z * q)

really do anything for optimization that simply specifying the sequence of “opcodes” doesn’t?

I have a feeling that we’ll be doing lots of cut and pasting and/or linking of small shader fragments to get a runnable shader in a the future. This is the only way to get light effects shaders to work with a general material system and custom shaders for example (more like renderman shaders).

Most definately (I’m defining a library to manage just such a system). The glslang shader linking mechanism (though I’ve never heard a really good explaination of the details on how it works) is, to me, the one saving grace of the language. It is, also, the feature that should definately go into any shading language paradigm.

The fact is that today’s hardware interface is very, very similar to arb_fp.

Oh really? I wouldn’t be too surprised if the 9500+ series wasn’t just running suped-up versions of ATI_fragment_shader-based hardware. The dependency chain is what makes me think of this. If you made an ATi_fragment_shader implementation with 4 passes and lots of opcodes/texture ops per pass, you could implement ARB_fp on top of that kind of hardware, as long as you could build a dependency chain.

And even if they’re not, how can you be so sure that the hardware looks anything like ARB_fp?

I don’t like the idea of having different code for arb_fp1 arb_fp1.1 arb_fp2k5.

A new extension for each generation! But isn’t that what’s supposed to change? How future proof is that?

I hope you don’t expect glslang 1.0 to be the final version. If you do, you’ll be sorely disappointed.

Whatever shading language the driver uses will change over time (probably about once per generation). Whether it happens to be glslang or an ISA of some form doesn’t matter.

If speed is what matters most, just use the hardware-specific extensions. If we want shaders that work across generations optimally (compiler dependent of course), a high level language is a good solution. Both forward and backward compatability is important.

Why is “forward compatibility” important? As long as old shaders work on new hardware, and work faster than they did on older hardware, then you’re getting all the functionality you need.

Also, as I pointed out before, you have failed to give a reason as to why a lower-level solution could not be compiled into just as optimized a solution as the higher-level one. All you’ve done is simply state that this is the case; that doesn’t make it true.

You are forgetting the huge userbase ATi is getting recently.

Larger, maybe. But do you think Id can afford to only sell their games to 40% of the potential market?

Now you’re getting paranoid.

Not really. Allow me to explain.

Without a glslang burned into the core, if instead they went with our approach, Cg would likely become the de-facto standard in terms of high-level shading languages. Oh, sure, you might have things like the Standford shader running around, but they wouldn’t be frequently used in production of actual graphics products.

This doesn’t do ATi or 3DLabs any good. It helps nVidia’s position, which weakens theirs.

So, explain how it is that nVidia ends up arguing for functionality that helps Cg while ATi and 3DLabs argue against it? It is obviously self-interest on nVidia’s part, but why is it so hard to believe that ATi and 3DLabs aren’t engaging in self-interest of their own?

glslang was a work in progress in the ARB for a while, and CG is an independent project.

Granted that, nVidia took Cg to the ARB once it was apparent that the ARB had decided to use a C-like solution for shaders (something nVidia fought against). However, the ARB (ie, nVidia’s compeditors) refused to use Cg’s syntax even though it is functionally equivalent to what glslang provides (or, at least, provides in hardware). Granted, there’s something to be said for the ARB wanting to keep control of the language, but they could have at least used the basic Cg syntax. That way, we wouldn’t have 2 different hardware-based shading language syntaxies running around (like we do now).

It was nVidia who chose this path

To nVidia’s defense, they started developing Cg before glslang was publically being tossed about. Indeed, I believe (but am not sure) that Cg was publically released before 3DLabs unveiled their glslang proposal to the ARB. nVidia saw a need just as much as 3DLabs did. And they went to fulfill that need.

[This message has been edited by Korval (edited 07-28-2003).]

Why all the talk about compilers, x86, SSE, 3dnow! and the rest?

I think we had settled this question.

quote (more or less):

The answer is HLSL advantage over ASM style is that it is easier and makes you a more efficient coder.

The person writing the compiler will have a hard time coding the HLSL compiler (compared to ASM compiler), but it might save developers some time.


Are there any other advantages that HLSL has over ASM and vice-versa?

All this considered I think it’s pretty obvious that the language needs to be high level (in the same sense C is high level).

Do you have any actual basis for this claim? While we’re both in agreement that ARB_fp doesn’t cut it, that doesn’t preclude something that looks similar to ARB_fp doing the job.

I don’t preclude something that looks like ARB fp doing the job, but I’m betting it will be about as high level as C. C was after all designed to be a portable assembler. As long as the language supports the demands I outlined above I don’t particularly care how it looks as long as there is something c/cg/HLSL/glslang like or higher level for me to program in. Viewing the history of OpenGL features that mapped badly to hardware but that were there for programmer convenience weren’t exactly success stories however so pushing programmer convenience above everything else is probably not a good idea. It should definitely be considered strongly though.

“Also, as I pointed out before, you have failed to give a reason as to why a lower-level solution could not be compiled into just as optimized a solution as the higher-level one. All you’ve done is simply state that this is the case; that doesn’t make it true.”

If your low level instructions are being compiled and optimized into something that’s not just a mapping, why not use the higher level language? It’s more readable, and hopefully less error-prone.

After re-reading that post, it sounds like you want to write to the metal NOW, for what you think will net you the most performance. Then, in the future, you want your code to be backwards compatible with whatever comes out then.

I’m confused.

why not use the higher level language? It’s more readable, and hopefully less error-prone.

Because it’s far easier to build a fast cross-assembler than a fast C compiler. That is, you usually don’t want your driver to spend ~1 second compiling your shader.

What’s a typical cross-assemble time we’d expect? I’m assuming here that you want to upload a different shader, or set of shaders, every frame. Otherwise, does the compile time really matteR?

[This message has been edited by CatAtWork (edited 07-28-2003).]

now it seems to be a countdown until the new drivers come out supporting 1.5! I just hope that my 5900 Ultra can handle the OpenGL Shader Language.

  • Lurking

Originally posted by Korval:

To nVidia’s defense, they started developing Cg before glslang was publically being tossed about. Indeed, I believe (but am not sure) that Cg was publically released before 3DLabs unveiled their glslang proposal to the ARB. nVidia saw a need just as much as 3DLabs did. And they went to fulfill that need.

That is not correct. glslang was first presented on OpenGL BOF at Siggraph 2001. At that time Bill Mark (CG’s lead designer) was still working at Stanford as a researcher on the Stanford Real-Time Programmable Shading Project , it wasn’t until October 2001 when he joined NVIDIA (From Oct 2001 - Oct 2002, I worked at NVIDIA as the lead designer of the Cg language).

The original “GL2” whitepapers were presented to the ARB meeting on September the same year and made public on December 2001 .

CG wasn’t offered to the arb until a year later or so:

“Cg” discussion
NVIDIA wanted to discuss their goals with Cg (although they are not offering Cg to the ARB).

June 2002 ARB meeting .

Those are the facts, take your own conclusions.

You know, I don’t like an idea of a high-level language that is build into OpenGL. I can’t say why. It… somehow limits my free space. I would prefer every single card having an assembly processor(may it be the different assembly every time) and that having an compiler that translates my hight level code into the assembly, with all possible optimisations. really, this is the way glslang and all HLSL work(on a driver level). But with glslang this asembly is being hidden inside the driver - the developers would have no access. What I mean - access to the assembly should be granted openly, so that anyone could write his own HLSL. I think it’s a big mistake to make a HL language to the core.

Originally posted by Korval:
[b] [quote]

D = a*b + dot(r + p, z * q)

really do anything for optimization that simply specifying the sequence of “opcodes” doesn’t?[/b][/QUOTE]

yes, if there is an opcode wich does exactly that in an extension of the standard asm support…

thinking of ati here, wich could, if a and b would be scalars, do it in the parallel unit parallel to the dot product (if that is a dp3…)…

makes it much more easy to directly compile down to the code of the actual hardware, than instead having to do two steps. its like loosing precicion if you rip a cd to mp3 and then convert to ogg. no mather how high the mp3 settings are, the ogg will never sound as good/equal as if you directly rip it to ogg…

here, “compression artefacts” are losses in optimisation, means, compiling twice results in overall less performance in the end.

there is no gain of using asm. none.

yes, if there is an opcode wich does exactly that in an extension of the standard asm support…

How is interpreting that equation any easier than interpreting the sequence of opcodes that it would generate? They both evaluate to expression trees; what is it about the C-method that makes optimizing it into a single hardware opcode more likely than the assembly one? There’s nothing that says that each assembly opcode must be equivalent to one or more hardware opcodes. If the assembly compiler sees something it recognizes or knows to look for, then it can optimize it just as well (and probably faster, since recognizing it is easier) than the C case.

thinking of ati here, wich could, if a and b would be scalars, do it in the parallel unit parallel to the dot product (if that is a dp3…)…

Once again, why is it that the assembly compiler can’t recognize these opcodes and do optimizations from them? Nothing is lost symantically in compiling the language into the ISA assembly.

makes it much more easy to directly compile down to the code of the actual hardware, than instead having to do two steps. its like loosing precicion if you rip a cd to mp3 and then convert to ogg. no mather how high the mp3 settings are, the ogg will never sound as good/equal as if you directly rip it to ogg…

This is a false analogy, and you should have known better than to propose this one.

Nothing is lost during the compilation to assembly. The ISA was designed such that nothing of actual value to the compiler is lost. The only difference between an expression written as in my example and an assembly-like expression is that the latter is easier to parse. They both contain the exact same information.

there is no gain of using asm. none.

Once again, saying it doesn’t make it true. Making assertions doesn’t win arguments. Arguments win arguments.

Besides, you’re not even thinking like a programmer in this. You thinking from having a pre-concieved notion of, “Assembly bad, C good.” Consider, for a moment, being told that the ISA approach is the way it’s going to go. And now, you have to write an optimizing compiler for it. You don’t have the luxury of saying, “Assembly bad, C good.” You’ve got a job to do. You have to make it work. And, by looking into the problem from that direction, you will come to the realization that it can work, and just as well as the C case.

i use myself a lot of assembly on intel platforms myself, so i would never say assembly is bad.

but tell me any reason its good here?

if you don’t have a perfect mapping from highlevel to assembler, a merely 1:1 mapping, you will loose performance because you loose info you can use for optimisation. and that IS true in all sort of assemblies.

else every p4 could by itself determine loops and all that, and rewrite everything for sse simd runtime.

the p4 is a great example on a platform that does not fit well to the oneforall x86/x87, and gains a lot by using direct highlevel to machine code compilation without going to a very old asm before…

i get up to 5x speed increase.

as opengl2.0 and its shader language want to stay for the next few, say 10 years, too, just like the old one, you have to take care of massive structural changes in hw that cannot get expressed really nice with asm anymore.

every compilation is a lossy “compression”. my analogy holds still true. for optimisation, its lossy.

you will loose performance because you loose info you can use for optimisation

You and others keep repeating this. Yet no one can produce a single example where this is irrevocably true.

the p4 is a great example on a platform that does not fit well to the oneforall x86/x87, and gains a lot by using direct highlevel to machine code compilation without going to a very old asm before…

You mistaken why compilers compile directly to assembly - it’s for the compilation speed gain, not the run-time speed-gain.
This same move happened a few years back when C++ compilers would output intermediate assembler (or native assembler) instead of C code for a C compiler.

as opengl2.0 and its shader language want to stay for the next few, say 10 years, too, just like the old one, you have to take care of massive structural changes in hw that cannot get expressed really nice with asm anymore.

There doesn’t have to be one single static version of the assembly language.

Just to take a recent example on old hardware:
If you had F = (AB) + (DE)

On standard assembly this would go to a MUL then a MAD instruction (and perhaps a temporary register) . However in register combiners you could have done the whole line in one instruction. So unless you want to add a ton of specific “weird” instructions to the assembly spec, you could never optimize for all these cases without allowing the driver access to the high level language.

The only real argument I can see against a high level language whould be a driver not optimizing as much as a person could (or performing unnecessary instructions). However, since there is no specific hardware to target, this low level optimization cannot be done by developers and is best left to the driver.