id's "Rage" engine, Tech5 is OpenGL based

I agree with every word Komat said. That’s exactly how i’d like the API to work / look like.

Jan.

Originally posted by knackered:
I don’t get your point, I have to trust them about performance every day, about every single part of the spec they implement. There’s nothing in the spec about performance, just behaviour. In any case, I’ll still pre-warm my shaders at start-up, so the worst case scenario is still restricted to start-up speed.
As for reliability, this cache system won’t add to their problems, it’s simply a bolt-on to the input and output of their driver code.
If they implement the cache system, then the problem goes away without going down a potentially nasty API change route. NVidia implement it, ATI will then have no choice but to implement it too otherwise people will complain about ATI being slower than NVidia. Intel will follow, as usual.

Strangely enough, it was someone from ATI (Evan Hart I think) who posted this idea. “Scan the text your app sends to GL and pick the blob from the database”.

That’s nuts if you ask me.

It’s better to do what D3D + its tools are doing.
IMO, it’s better for the API to offer solutions and using it is the developer’s responsibility.
If the developer wants to screw himself with slow compiler time, that’s fine.
If the developer wants to screw himself with the issues that come with a precompiled thing, that’s fine.

I do not like this “embedded source” idea. The unblobing is presumably a fast operation while compilation from source code is very slow one.
Yes, but no slower than what you would have asked for originally.

The problem is that you cannot force a driver to use a blob. The driver decides whether the blob will be functional or not. Presumably the program you’re trying to load is important, so you’re going to give them that string whether it comes from the blob or from your code.

At least if the string is in the blob, it’s guaranteed to always work, so there’s no need for multiple program compilation paths.

With the “embedded source” any blob load might be potentially as slow as compiling the shader and you lost this control.
Without the embedding of the source, any blob can simply fail to load, and thus will force you to pass in the string and compile the normal way.

Remember, if IHVs want to get lazy, they still can in your way. They can make the “binary” blob simply the string, and do the compiling all the time. There is nothing you can do to stop IHV laziness; the best you can do is offer them reasonable avenues for optimization.

I have a couple questions that are pestering me.

  1. How does the bottom line factor into IHV decisions regarding OpenGL, as far as time and resources invested in ARB participation, API design, and any marketing?

  2. How do OS specifics factor in to API design? I realize of course that the ideal is OS neutrality within the API itself, but given the practical concerns of real world implementations (DMA scheduling, bus bandwidth, etc.), are there any issues that might guide the API design in ways that are less than obvious. For example, an obvious one is the introduction of PBOs, which as I understand it is predicated on availability and efficiency of asynchronous DMA transfers.

To frame this in the current topic, I’m wondering if there might be more subtle side effects of OS internals involved in the overall API design going forward. In particular, how might Vista’s new driver model influence design considerations, now and in the future?

Originally posted by Korval:
Without the embedding of the source, any blob can simply fail to load, and thus will force you to pass in the string and compile the normal way.

This is exactly what I want. If it can not be loaded fast, then the load should fail. This is similar to GLSL shaders for which would be better to fail compilation than switch to sw emulation.

Yes, if I need that shader, I will need to compile it explicitly. The difference is that in that case I can decide, when to compile it (or temporarily replace it with lower quality shader until asynchronous compilation completes). For example of what I am thinking about see following pseudocode utilizing asynchronous compilation.


At least if the string is in the blob, it’s guaranteed to always work, so there’s no need for multiple program compilation paths.

You still need code which generates the blobs from source code. Because during development it is useful when the program can operate without the blobs so you can easily modify the shaders, chances are that you will already have that path anyway.


Remember, if IHVs want to get lazy, they still can in your way. They can make the “binary” blob simply the string, and do the compiling all the time.

Yes they can however I hope that they will not go that way in this case. One thing is trying to optimize too much for some case and different thing is going directly against spirit of the blobs.

You still need code which generates the blobs from source code. Because during development it is useful when the program can operate without the blobs so you can easily modify the shaders, chances are that you will already have that path anyway.
Since the string part of the blob format will be well defined (so that one implementation can read the string from another implementation), it is quite reasonable for a developer to work exclusively with blobs. All they need to do is use an IHV GUID that nobody is using.

Originally posted by Korval:
Since the string part of the blob format will be well defined (so that one implementation can read the string from another implementation), it is quite reasonable for a developer to work exclusively with blobs.
Imho the only reasonable use of blobs is as optimized loading format.

For all other uses they are vastly inferior to even current glsl api because they do not have support for linking multiple shader fragments nor the compilation error reporting facilities and also have binary based structure.

The purpose of the api should be to provide services which you can not implement yourself. In this case it is a fast loading of GPU native format using opaque blobs. Regeneration of the blob when it becomes incompatible can be easily handled by the application using the “existing” shader api so it should be part of some helper library and not part of the api itself.

Originally posted by yooyo:
[b] Maybe something like encription… App provide key to driver and driver can decode encripted shader and compile it. Separate app can encode all shaders into binary files usin common encription procedures.

Or even better… App can provide decoder callback to driver. When driver compile shader it actually callback app to decode binary shader in ASCII string and compile it.[/b]
Encrypting shaders in either of the ways you describe won’t actually provide much protection to a determined hacker - a simple GL wrapper library will enable getting at the plain text shader in both cases. That’s also ignoring the fact that, because your app would never run on it otherwise, Mesa will provide an open source implementation of the decoder - so a hacker could get at your shader with little more effort than setting a breakpoint.

If a developer doesn’t want to distribute their shaders as plain text they can store an encrypted version on disc and decode it before passing it to the API. Anything beyond that will be more awkward to use than it will be for hackers to bypass.

As for storing compiled shaders, I think leaving caching blobs to the driver is a bad idea.

The fastest way for an application to load multiple blobs will be to load one file, which contains all the blobs, into memory in one go and call glProgramBLOB, or whatever, passing different blob pointers as it needs to from there. Having the driver mess about with multiple files, or even a single file acting as a database of varying numbers of blobs, will be slower.

Caching of blobs should be left up to applications because they know their usage patterns and will be best able to pick a scheme which suits them.

I don’t understand why there’s a general dislike of the transparent driver caching idea. Nobody’s actually given a reason why it’s a bad idea.

That’s nuts if you ask me.

As for storing compiled shaders, I think leaving caching blobs to the driver is a bad idea.
You just keep saying that you’d like more control over managing vendor-specific blobs, for some weird reason. Next you’ll be asking for control over how it searches for a matching pixel format, giving it it’s own extension and a chapter in the red book.
The caching system would just work, all the time, no matter what, without anyone having to do any compiler switch nonsense (which is effectively what you’re talking about).
Give reasons against it.

Originally posted by knackered:
[b] I don’t understand why there’s a general dislike of the transparent driver caching idea. Nobody’s actually given a reason why it’s a bad idea.

[quote]As for storing compiled shaders, I think leaving caching blobs to the driver is a bad idea.
You just keep saying that you’d like more control over managing vendor-specific blobs, for some weird reason.
<straw man snipped>
Give reasons against it. [/b][/QUOTE]Nobody’s said that the driver caching blobs is the worst idea ever, we just don’t think it is the best.

For most invocations of most applications, shaders will have been compiled and their blobs cached somewhere from a previous run. This should therefore be the behaviour to optimise for.

If a driver stores blobs as separate files, then it will have the overhead, for each blob, of finding the file on disc, opening it, loading the data and closing the file. Separate files are more likely to be spread about the disc leading to more disc head movement. Optimisations in the file system may help but all of this will ultimately slow down the loading of blobs. If the driver implements some sort of database in a single file to reduce disc access overhead, then it will have to manage the complexity of adding and deleting files and the corresponding fragmentation in the database.

An application can, at install, first run or “driver changed I’ve got to do some reconfiguration” time, store all the blobs into one, sequentially written file. This will give the best chance of that application’s blobs being written to disc in a minimally fragmented file. When it comes to reload the blobs, this will be done by opening only one file and, because it will be a sequential read of a minimally fragmented file, loaded into memory quickly. The application then has all the blobs ready to pass to the ‘attach program blob’ API as needed.

Leaving blob caching to the driver will undoubtedly be less work for application developers and may allow things like a new driver recompiling all the cached shaders as part of the driver installation but ultimately, applications loading blobs themselves in the manner described above will be faster for most invocations of most applications.

That is why I think it should be left to applications.

I don’t understand why there’s a general dislike of the transparent driver caching idea. Nobody’s actually given a reason why it’s a bad idea.
If you disregard the actual reasons given thus far, then yes, nobody’s given an actual reason. However, if you actually pay attention to the responses to the idea:

Korval: The only way to guarantee this (or as close as it gets) is to have something that can be written into an OpenGL spec.

Komat: And I already got burned by expecting that the uniform variables are really variables which can be freely changed and not something that will cause shader reoptimalization.

Davej: Caching of blobs should be left up to applications because they know their usage patterns and will be best able to pick a scheme which suits them.

So there have been plenty of reasons given against it. And the only argument for it is that it is transparent to the application writer.

There is also the matter of when does the driver clean up the cache? If you are keying on the shader then, unless I’m mistaken, a small change in a shader means another key, another entry and more disk space gone.

When I’m working on shaders I tend to go thru many revisions, with this transparent caching it seems they will all be left laying about until such time as I reinstall windows; and most of them will be taking up diskspace for no good reason.

Same applies to game; if I uninstall a game I expect it all to vanish, not to have cache’d files by the driver laying around which might never get used again.

Finally, if the driver regenerates this cache on a driver change (as required) it’s going to add more time to installing the driver (which frankly is long enough as it is already) and end up doing redudant work if, again, most of those cached shaders are unused.

This discussion become offtopic… but anyway…

Plain text or binary blob? If you (with app) deliver plain text you cant protect IP. Developers can encrypt shaders, but this is “easy to hack” solution. Maybe we can obfuscate shader?

What if you deliver binary blob? What is this binary blob anyway? Is it compiled shader returned (in binary form) from driver? We have different vendors, each have several hardware classes, same shader can be compiled on different way depending on underlaying hw & driver. Even more, if user change IQ settings in game, this action will probably trigger changes in shaders and force recompilation. If app use exactly the same shader but with different prams and context it can be compiled different… same shader different results.

So… this binary blob must be some bytecode in common form supported by all vendors. Im affraid, this is not going to happen ever. Even more, this bytecode can be reversed back to plain-text (but little obfuscated).

If driver have internal binary shaders database, app still have to provide original shader to compilation. Driver can cache binary blobs in app folder for later use. This could be implemented in driver control panel… simple check-button in UI. This will speedup game startup on second run.

Some hw vendors use dirty tricks & shader replacement by detecting executable filename, so they already have precompiled shaders.

At end… compilation speedup can be achived by caching shaders on local disk. Application should not be aware of this. Protecting IP can be done by some level, but it is not safe enough (as any other digital protection).

JC engines doesnt use too many shaders, so Doom3, Q4, ETQW, etc. and his engine have quite fast startup.

It’s better to do what D3D + its tools are doing.
Ja, das ist gut.

I see only one argument against driver based caching, and it is:

Davej: Caching of blobs should be left up to applications because they know their usage patterns and will be best able to pick a scheme which suits them.

But even this is not so important. The driver can detect basic usage patterns easily. It can maintain usage statistics for each shader and cache it only if it is a) long enough b) used often enough c) add your choice

Also, the extension may introduce some driver hints(like glSetShaderParameter(shader, GL_CACHE_SHADER_EX, GL_TRUE))

Of course, there may be some special cases… Your application generates thousands of shaders on the fly and uses them only once? Well, you probably won’t be caching them yourself anyway. Still, if by a small chance some particular shader code is generated more often then others, a driver with simplest usage statistics will notice it immediately and put it into the cache.

I imagine something like this: the driver creates a file per application, somewhere under the user’s data. This file contains the cached shaders. Each time the rendering context is deleted, the driver evaluates the statistics and updates the cache. The implementation should be simple enough, I am sure that a more or less proficient programmer could code such system in several days.

@bobvodka: The reinstalation of drivers won’t take more time, the uninstall utility just deletes the cache files.

And of course, Korval is right about having to specify this behaviour in the spec. Still, I don’t see why OpenGL spec can’t mention saving data to users hard drive.

I like such approach very much, because it is very transparent. It “just allows” new functionality, even for applications that aren’t aware of it. And as mentioned, some fine-grained control about caching could be provided via extensions. Maintaining the blobs manually destroys the abstraction.

So… this binary blob must be some bytecode in common form supported by all vendors.
You’re misunderstanding the conversation.

The debate is between three alternatives:

1: Let the IHVs optimize program compilation by hoping and praying that they will implement some kind of caching system that compiles a particular program once and then uploads the compiled program when it detects you are trying to compile the same program again.

2: Add an extension that provides the ability to retrieve a “binary blob” from a program object. This blob will contain the program in an IHV-specific format of an indeterminant nature. The only part of the format that is cross-platform is a multi-byte header identifying the GL implementation that created it. If you load this blob into a separate implementation, it will simply fail. And even the implementation that created it can decide not to load it again (suggesting that it can optimize the string version better now or something).

3: Same as 2, except that the binary blob must also contain the string form of the shader in addition to the implementation identification code and the implementation’s binary data. The string will be stored in a well-defined format. This allows a program to use blobs built from any implementation in any other, because worst-case, loading it simply provokes a recompilation.

We are not suggesting a new shader language, whether textual or binary.

The driver can detect basic usage patterns easily.
Actually, no: OpenGL pre-3.0 have shown that detecting usage patterns sucks. The detection either gets them wrong or gets them kinda right or whatever. It’s never as good as it would be if the driver and the code established a real contractual obligation that was enforceable in some way. That’s why 3.0 abandons anything that requires such detection in favor of a more rigid approach.

It’s also the reason why 3.0 won’t have hints. Or won’t use them for much.

The implementation should be simple enough, I am sure that a more or less proficient programmer could code such system in several days.
Yes, and implementing a functioning glslang compiler should be simple enough that 2 years ought to be enough. And yet both nVidia and ATi have failed miserably at it.

IHVs have violated the trust relationship enough that trusting them on shader stuff is just stupid. I’d much prefer a contract written in spec-language that you can at least verify is being honored.

Also, this doesn’t answer Komat’s issue. That is, what if you can work around the long compile somehow? What if you can compile a subset of shaders that take less time initially, just to get up and running, and then compile others in a separate thread? You method would make it impossible to tell if recompiling is going to take place, so it would be impossible to work around the long compiles in the event of a forced recompilation.

Now, I don’t necessarily agree with the point (mainly since I don’t plan to explore such a solution), but I can’t really find fault with it, as it would make a potentially important kind of solution impossible to implement for little reason.

the uninstall utility just deletes the cache files.
Immediately causing every GL program that relied upon that cache to suddenly take 10 minutes to start up when it used to take 20 seconds. This is a pretty strong argument against this.

I’d rather not put that kind of basic application performance in the hands of people who have been shown, time and again, incapable of being able to write a C compiler.

Still, I don’t see why OpenGL spec can’t mention saving data to users hard drive.
Because OpenGL specifies behavior, not optimizations. What you’re talking about is an optimization.

The GL spec can only say what will happen to the internal state (contents of images and framebuffers, etc) of a context when you call certain API functions. It certainly cannot state what will happen to information when the context is destroyed and a new one created perhaps days later. “Behavior” is that which you can detect has happened in the state by looking at the state.

You cannot detect that something has been put into a cache unless the cache object itself is a GL object that you can talk to. You cannot detect that some file has been put somewhere, etc. In short, this is not behavior.

It’s also the reason that GL doesn’t specify windowing system dependent stuff, even if it could in an OS-neutral way. It is simply outside of its domain.

I think I have made my opinion on this matter known previously:
http://www.opengl.org/discussion_boards/ubb/ultimatebb.php?ubb=get_topic;f=7;t=000626#000012

And I can still say I prefer the D3D approach. (I fact I seems to recall that Nvidia engineers were pushing for this when GLSL was first proposed. Argument being that they would have to optimize for D3D anyway and they hated adding 1MB+ bloat to their code)

To re- iterate another post of mine:

I still think a intermediate representation is a good idea even if the compile times are the same. (which I highly doubt)

  • Easy to see when the compiler does “dumb things”.
  • Don’t have to worry about code parse bugs. (These should not happen, but do)
  • Dead code/Code folding optimizations can take as much time as needed.
  • Don’t have to worry about spec violations from different vendors (or even changing between driver versions)
  • Easier for driver writers to support. (probably)

I know a lot of these problems do not exist in “theory” but in practice I believe compiling at runtime is adding a huge surface area for failures.

Korval, My vote goes to:

2: Add an extension that provides the ability to retrieve a “binary blob” from a program object.
I agree with pretty much everything Komat has said in this topic in support of this option.
I dont want the source in the blob because
a) 99% of the time you dont need it so its a waste of time & memory loading it
b) I want the choice to do the recompilation in the background or at a later time

As for encrypting shaders (yooyo), any scheme to pass encrypted code to the driver will either need a decoder in the driver that will be hacked in no time, or will pass unencrypted data where it can be intercepted.
Much better to just publish your fabulous shaders in GPU Gems so we can all admire your work.

I still think a intermediate representation is a good idea even if the compile times are the same.
Whether it is or not is entirely irrelevant. Because it’s not going to happen.

Continuing to extol the virtues of such a system is meaningless posturing in the face of this fact. Back when the debate was going on, all of these things were brought before the ARB. They considered them and rejected them in favor of the advantages of glslang.

So, we are where we are. Proposing “solutions” that aren’t going to be implemented is useless. The one thing that the 3 outlined possibilities have in common is that they could all happen.

Fair enough, I understand davej’s objection.
I also understand sqrt’s objection.
The d3d way it should be.