id's "Rage" engine, Tech5 is OpenGL based

Yellow press at its best.
Catalyst drivers before 7.2 didn’t contain an OpenGL ICD implementation under Vista.
This article was comparing HW OpenGL under XP with MS SW OpenGL under Vista. Pure FUD.

I’ve been running with acceleration on NVIDIA HW since last year. ATI was very late releasing drivers (at least for FireGL HW (Q207)). There is a hit in performance (5%-15%) for windowed apps but its workable. The biggest problem however is with interop with the Aero DWM – just about any 2D write operation to the window will corrupt the DWM composite buffer.

The biggest problem however is with interop with the Aero DWM – just about any 2D write operation to the window will corrupt the DWM composite buffer.
That’s a Microsoft Vista “feature” and documented here:
http://www.opengl.org/pipeline/article/vol003_7/

GDI surfaces are separate from 3D surfaces and you can’t mix them anymore under Vista.
Aero only stays on for pixelformats with the new PFD_SUPPORT_COMPOSITION flag set, and that flag and PFD_SUPPORT_GDI are mutually exclusive.

This was documented by Microsoft way before that pipeline article and I’ve known about the problem since the earliest beta releases. Just because it’s documented doesn’t make it any more palatable and doesn’t eliminate the problem. Even if your application has an absolutely pure 3D pipeline and can typically interoperate with the DWM, an external application (e.g., Spy++ Finder Tool) can corrupt your composited image. Constant frame rate applications will see a glitch, static frame applications will need to manually refresh the window (or figure out a way to circumvent the corruption).

What will happen to DirectX 10 performance on Vista when this happens (C|NET):

http://www.news.com/8301-10784_3-9785337-7.html?tag=nefd.only

While Vista was originally touted by Microsoft as the operating system savior we’ve all been waiting for, it has turned out to be one of the biggest blunders in technology

The time is up. Microsoft must abandon Vista and move on. It’s the company’s only chance at redemption.

Ouch!

When I said Vista was the new Windows Me I was wrong, surprisingly it’s shaping up to be worse.

IMHO it makes a lot of the issues surrounding 3D on Vista irrelevant. The bigger issue is in fact that you can’t get D3D 10 on XP. My question is how likely is a reversal on D3D 10 availability on XP? Is OpenGL going to be the only game in town for the Windows mass market for next gen 3D features?

Is OpenGL going to be the only game in town for the Windows mass market for next gen 3D features?
No. Because despite whatever nonsense articles (saying that MacOSX is “hot on its tail” is laughable, as is decrying the OS for assisting DRM when the alternative was not being able to play DRM’d movies at all) get written, Vista is in fact the future of Windows. Whether it takes one or two Service Packs, whether it takes a year or two for drivers to mature into stability (not Microsoft’s fault), people will eventually switch.

There was another kernel and Windows OS when ME came out (not that ME forced XP; the NT kernel was always going to be Window’s future); Microsoft has no alternative but to live on the Vista codebase and expand upon it.

I suspect that game developers will either accept cross developing for D3D9 and D3D10 or just switch to OpenGL 3.0. The deciding factor will likely be how fast and on the ball IHVs are with GL 3.0 implementations. If the spec ships soon, and nVidia, ATi, and Intel can get good drivers out within 2 months of that (not just beta crap, and certainly nothing that remotely resembles anything ATi has farted out as far as GL support goes), I suspect developers will be inclined to switch to GL 3.0.

Though I suspect the ARB will need to quickly resolve one of OpenGL’s remaining annoying problems . That is, precompiling shaders. It’s one of the last real warts left in the language, and is a source of frustration for developers who need lots of shaders.

It seems there’s a burgeoning industry out there that disagrees with you and Microsoft on that score.

I hope you’re right about the move to OpenGL 3. D3D 9 with fragment based alternatives will do just fine for all the displacement stuff that gets promoted as D3D 10’s highest visibility trick. The only surprise is that stuff actually ships (or screenshots are published) without a fragment based displacement path in D3D 9, it’s downright perplexing. With the possible exception of the cascades demo I have not seen any D3D 10 software that really justifies the API, and then of course it’s not clear that cascades wouldn’t be better off in software and fragment shaders with broader support from a developer’s standpoint (and a market i.e. XP).

Agreement on precompiled shaders is non trivial, compilers and underlying implementations are currently free to differ profoundly. It’s a great idea until you have multiple companies years into their respective investment in their shader optimizations, intermediate representations and instruction sets.

On the precompiled shaders topic : compiling shaders as implementation-dependant binary blobs at first launch and then reusing them for subsequent runs does not seem unsolvable to me.
If driver or hardware changes, it would recompile the blob.
It would still allow profound implementation differences, and would help for 99% of the time.

ZbuffeR, that’s actually a pretty good idea: have the binary blob actually store the program text itself. That way, it just doesn’t matter. So long as each IHV gets a unique identifier for their own binary portion (a GUID should suffice), and the textual format is rigidly specified, I don’t see the problem.

For me, this could be transparently implemented by the driver using the source code as a huge primary key in a database that links to a previously built binary blob. Even a dbase query like that would be infinitely faster than this compile/link thing (like a ms, if that).

For me, this could be transparently implemented by the driver using the source code as a huge primary key in a database that links to a previously built binary blob.
Um, no. I do not believe that drivers should randomly start storing databases on my harddisk.

Particularly since I develop shaders, and will therefore have thousands of entries (many of which are non-functional).

This mechanism would be best implemented explicitely, with API entry points to retrieve / upload the binary blob (and only if desired).

Precompiled shader would be difficult because new driver version could better optimize shader code.

Maybe something like encription… App provide key to driver and driver can decode encripted shader and compile it. Separate app can encode all shaders into binary files usin common encription procedures.

Or even better… App can provide decoder callback to driver. When driver compile shader it actually callback app to decode binary shader in ASCII string and compile it.

It is very hard to make compromise between IP, compatibility and performances.

Actually, something like what knackered suggested (cashing compiled shaders) would be the simplest solution. Just let the driver do it. It could also clean the cache from time to time, deleting not so often used entries. The cache should be stored somewhere in the user home directory.

Precompiled shader would be difficult because new driver version could better optimize shader code.
Yes, and if that’s the case, the driver can detect this by comparing its driver revision number to the one it stored in it’s binary data. That’s an implementation detail best left up to IHVs.

Really, putting the text in the actual binary blob was the basic sticking point, from my perspective. By doing that, you ensure that any implementation can always recompile the program, which makes binary blobs interoperable across implementations.

Oh, and another reason not to do it transparently: transparent behavior cannot be specified by the GL spec. It can’t say, “Oh, btw, you should create some database somewhere so that, between evocations of the compiler, you can check to see if a program is setup exactly as a prior one and load the shader from there.” It can only specify behavior that the user can detect the results of on screen. Performance optimizations have always been un-specifiable.

This way, implementers are forced to give us a back-door. The only potential problem is that it isn’t guaranteed to be faster, since an implementation could store nothing but the program in the binary blob.

Looking at it from a GL 3.0 perspective, it is exactly like requesting a binary form of a program template object.

Performance optimizations have always been un-specifiable? That’s the whole point of OpenGL, it specifies correct behaviour, the implementation is free to optimise as it sees fit, including whatever method it chooses to speed up shader build times. I’m not suggesting the cache be part of the OpenGL specification, I’m just suggesting the IHV’s do something like this to work around the otherwise elegant design of GLSL without cluttering up the spec with some hideous vendor-specific binary blob crap that, up until now, hasn’t been required in any of the other OpenGL mechanisms.
Honestly, if Nvidia give me a method to retrieve the binary data they send to the card after linkage, and a method for uploading a previously generated blob, I could write this functionality into their driver within a couple of hours.

I’m just suggesting the IHV’s do something like this to work around the otherwise elegant design of GLSL without cluttering up the spec with some hideous vendor-specific binary blob crap that, up until now, hasn’t been required in any of the other OpenGL mechanisms.
Consider this.

Are you willing to trust the performance of your application to people who, thus far, seem incapable of even implementing a function C-like language compiler? Regardless of the fact that IHVs have every reason to want to release solid, stable drivers, they seem absolutely incapable of doing so. Even if you only look at the program compiling/linking part, they’re all over the place.

Basically, if you’re wrong in your trust, and you go ahead with a plan to implement a 2,000 shader application, then your app takes 10 minutes to load every time.

I don’t feel comfortable relying on them to get the job done at this point. The only way to guarantee this (or as close as it gets) is to have something that can be written into an OpenGL spec.

It may not be the prettiest way of doing it, but if it’s in the spec, then they have to implement it. Even if they implement it crappily, it’s still better than what we have now.

I don’t get your point, I have to trust them about performance every day, about every single part of the spec they implement. There’s nothing in the spec about performance, just behaviour. In any case, I’ll still pre-warm my shaders at start-up, so the worst case scenario is still restricted to start-up speed.
As for reliability, this cache system won’t add to their problems, it’s simply a bolt-on to the input and output of their driver code.
If they implement the cache system, then the problem goes away without going down a potentially nasty API change route. NVidia implement it, ATI will then have no choice but to implement it too otherwise people will complain about ATI being slower than NVidia. Intel will follow, as usual.

Originally posted by Korval:

Really, putting the text in the actual binary blob was the basic sticking point, from my perspective. By doing that, you ensure that any implementation can always recompile the program, which makes binary blobs interoperable across implementations.

I do not like this “embedded source” idea. The unblobing is presumably a fast operation while compilation from source code is very slow one.

With them being completely separate you can choose when to do each of them. For example silently load the blobs as part of the program startup knowing that the time will be reasonable or compile the programs at more appropriate place (e.g. before entering the level or during intro movie so the user will at least get the “In menu experience” quickly after start of the program) while displaying notification to the user that this might take some time.

With the “embedded source” any blob load might be potentially as slow as compiling the shader and you lost this control.

Unlobing and compilation also have different semantics. When I successfully unblob the blob I know that it matched the hw and I am done with it. When I compile shader I know that I need to store the resulting blob. With the driver recompiling the shader based on source from the blob, the unblobing api would need to report that event to the application so new blob can be stored for the shader. While this is not a technical problem it might indicate that completely separate compilation and unblobing is more cleaner approach.

Originally posted by knackered:
I don’t get your point, I have to trust them about performance every day, about every single part of the spec they implement.
And I already got burned by expecting that the uniform variables are really variables which can be freely changed and not something that will cause shader reoptimalization.