OpenGL Extensions

I noticed that many of you guys, use functions from OpenGL extensions. My question is this: Aren’t these functions hardware dependent? i.e. these functions work on specific graphics card but doesn’t on another card. Does programming using these functions mean that my code won’t run on another graphics cards and makes it less portable? What’s the point then of OpenGL with these hardware specific extensions?


Well, for one, a lot of extensions are supported on a number of cards (ex : multitexturing). It’s the same issue as MMX, SSE or 3DNow. Using them makes the code faster but less portable. If you’re building an application that is supposed to run on a wide range of machines, then you can use standard GL only or use extensions for specific problems with a second branch when the extension isn’t supported (you can query that at runtime). Where I work, most of the time our applications are targeted to specific graphic cards so we can use all the extensions available.

Thank you Olive for the clarification. Anyway, I still wonder if we are going back to the old days, when the programer had to detect the hardware and let his code accomodate to this hardware. One of the benefits of modern operating systems is to isolate you from hardware so that you can write independently from it. OpenGL came to bridge the gap between different platforms when rendering graphics, but extensions and different hardware vendors seem to widen this gap!

Well, it’s just a thought

[This message has been edited by softland_gh (edited 12-22-2000).]

I think it’s pretty clear that the benefits of extensions massively outweigh the disadvantages.

Extensions are what make OpenGL work.

If it was up to the ARB to improve OpenGL, the API would never advance. The ARB is a committee, and committees are not particularly efficient, so it has (wisely) taken the role of focusing more on ratifying existing extensions than on creating new ones from scratch.

Many extensions are cross-vendor. All EXT extensions are, and many extensions that have single-vendor names have been implemented by multiple vendors. The recent addition of ARB extensions even means that some extensions are considered important enough to be official parts of the API (if in spirit only).

If there were no extensions, there would be little to no OpenGL innovation. Implementing a fixed API is hardly an inspiring goal. Microsoft bypasses this by releasing a new DirectX every year. OpenGL bypasses it with extensions.

Portability is not the only goal of OpenGL. There are also different types of portability. Using extensions does not hurt cross-OS portability. Our Windows and Linux drivers support the same extensions.

Extensions are optional. As an example, the recently-released OpenGL game MDK2 does not use any OpenGL extensions (not even multitexture). This didn’t hurt its popularity. [Because MDK2 doesn’t use extensions, I find it very humorous whenever I see any claims about how it’s such an advanced engine. A “T&L” checkbox does not make a game advanced.]

Extensions are a big part of what I do as an OpenGL driver developer. In fact, right now, I’m writing 3 new extension specs.

Extensions are good.

  • Matt

>Portability is not the only goal of OpenGL.
>There are also different types of
>portability. Using extensions does not hurt
>cross-OS portability. Our Windows and Linux
>drivers support the same extensions.

How are those BeOS drivers coming along?
Would be a shame if the fastest GL
implementation around only supported Radeon
and not GeForce.

Disclaimer: I have never once in my life even seen BeOS, or even talked to someone who had used it, much less used it myself. This message also represents personal opinions that should not be associated with the opinions of my employer.

That said…

I am extremely suspicious of the claims that the BeOS implementation of OpenGL is so miraculous. Most of the debate surrounding those claims has centered around the extremely false claim that Windows is somehow an inefficient platform to run OpenGL on.

In fact, the vast majority of OpenGL apps see little to no overhead from Windows itself.

The same goes for the claims of how “Linux OpenGL should be faster than Windows OpenGL because Windows is so slow!” Nope, they should run at the same speed; to claim anything else is just moronic anti-MS propaganda.

Now, our OpenGL driver is an ICD. We run the same ICD on Win9x, WinNT, and Linux. As such, if we were to port it to a new OS and run on the same system architecture, well, the performance would have to be… the same! (modulo a few minor issues, but the basic performance should be identical)

All of our competitors also use an ICD driver model on Windows. Interestingly, many have chosen to not use an ICD driver model on various other OS’s (be it Linux, Mac, BeOS, whatever).

An ICD is a full implementation. Therefore, an ICD is always capable of being faster than an MCD-like driver model.

So when someone comes along and says that an MCD-like driver model (like the one on BeOS) is outperforming Windows drivers, this means one of two things:

  1. Gross incompetence in writing the Windows driver
  2. False performance claims about the MCD driver

In the case of BeOS, I am strongly leaning towards (2). Simply put, a well-engineered ICD will always be faster than an MCD.

The BeOS runtime has all these wonderfully magical SSE optimizations? No matter, an ICD could have the same ones, in fact, even better – because they’d be tuned to fit the HW in question.

MCDs also make it much harder to put extensions in the driver, especially when the runtime is supplied by another company.

As for BeOS support itself, well, I have not really seen much clamoring for it. Remember that supporting a new OS would take away resources from other areas. I know I’m already overworked; I really have no desire to see yet another responsibility piled on the backs of our SW team unless it will open up big doors for our company. Linux has the opportunity of doing that – it’s got the whole IRIX to Linux transition for workstation folks behind it, as well as the geek factor. BeOS, as far as I can tell, falls short on that standard. Remember my disclaimer? For me, BeOS is pretty much off the radar; I hear a tiny bit of news about it every few months on the Internet. (I’m not much of an alternative-OS person, if you can’t tell. I hate all forms of Unix with a passion. I only use Win2K, and I love it. Works perfectly for all my needs.)

As a small addendum: Yes, there is some overhead in running OpenGL on Windows. There are two aspects of this overhead: TLS overhead and opengl32.dll overhead. Fortunately, TLS overhead is nonexistent on WinNT and Win2K. And opengl32.dll overhead is unmeasurable except for immediate-mode applications. The opengl32.dll overhead could have been avoided by MS, I believe, but it’s hardly significant.

Another addendum: Direct3D has an MCD-like driver model. Many of the problems of D3D are problems in the MS runtime, not in vendor-supplied drivers themselves. The MCD driver model diffuses responsibility among more vendors, reducing quality. It is tempting to blame MS, but I think the problems would have happened no matter who had maintained the runtime.

  • Matt

I can shed a little more light on the claims
of BeOS OpenGL being much faster than
Windows. Specifically, this is for the case
where GL does the T&L, as measured on the
voodoo3 card. I have not seen any other
numbers. (And these numbers were from a pre-
release version which you still can’t go out
and buy.) I assume the soft T&L code on BeOS
was written by space aliens with superior
intelligence, and therefore runs faster.

That being said, the market for the operating
system formerly known as BeOS (now re-
packaged as an internet appliance OS named
BeIA) is probably currently not on the kinds
of devices that can afford your graphics

But, as a happy BeOS user (I’ve been all
over the map, and now use BeOS/Windows about
50/50 with Linux relegated to my server) I
would still like for my GeForce2 to come up
in accellerated high-resolution mode, and not
VESA 1024x768x60 Hz. And, ideally, do GL in
hardware mode.

Alien technology… oh no, he’s found our secret! Quick, hide that Roswell probe we bribed all those top government officials to obtain!

Seriously now, yes, there is optimization potential with SW T&L, but I question how much, and whether it would be better accomplished in the driver itself.

With HW T&L, intermediate layers between the driver and the app start to become a real problem. When you are using vertex arrays, a HW T&L driver is really just a glorified memcpy. Much more complicated than a memcpy, but the idea is the same.

Unless the BeOS folks have discovered an x86 instruction Intel has been hiding for years that lets you magically transfer data faster, well, a memcpy is a memcpy.

  • Matt

Are you fighting guys? Sorry, but I don’t know what youa are talking about. Well, I didn’t mean that extensions is a bad thing, but they could’ve make it in such a way that will guarantee for the programmer that his program will run on a wide range of computers. These extensions must include software implementation of the hardware feature, so that it can automatically switch to this implementation if it didn’t find the suitable hardware. Somehow like DirectX (HAL anl HEL)

>Unless the BeOS folks have discovered an
>x86 instruction Intel has been hiding for
>years that lets you magically transfer data
>faster, well, a memcpy is a memcpy.


Have you ever timed memcpy() on Windows? It’s
pathetic. It gets like 75 MB/s on a CuMine
PIII with PC133 memory. I believe they’re
moving longwords or something.

Meanwhile, memcpy() on BeOS gets patched up
at load-time to an implementation optimized
for your hardware. I’m pretty sure the GL
implementation goes one step further than that. This attention to detail is one of the
reasons I like BeOS.

Anyway, I agree that when the hardware does
all the important stuff, the software layer
should be as thin as possible and get the
hell out of the way if it can.

My complaint is just that on one of my
machines, the one with the nVidia card, I
cannot get decent graphics performance when
running BeOS, whereas on the others, I do.
And, while the folks at Be did the drivers
for those other cards, they cannot do that
for nVidia, because you guys are too tight
with your specs. That may have all kinds of
business justifications, but at the end of
the day, you’re gonna hear users like me

I don’t know what ideas you have about memcpy, but my experience is that it is a system-memory-bandwidth limited operation, pretty much all the time.

Feature emulation is often a bad idea when it comes to 3D. If the feature is on the pixel level, you get a SW fallback. This is not what people want.

DirectX certainly does no better a job than OpenGL with these things. DirectX uses capability bits, which are usually poorly documented and confusing, while at the same time causing compatibility problems.

On the other hand, OpenGL drivers are required to provide a SW fallback for any feature they can’t support in HW.

The fact is, if you’re writing cutting-edge 3D, you have to code to the HW somewhat. There is no way around this. D3D has no way around it. OpenGL has no way around it. The only way around it is to ignore the HW completely and use a SW renderer, but I don’t think that’s what you want either.

  • Matt

> I don’t know what ideas you have about
> memcpy, but my experience is that it is a
> system-memory-bandwidth limited operation,
> pretty much all the time.

Coming from a driver writer I have so far
had reason to respect, I’m surprised by this
statement. Computer architecture has come a
LOOONG way since moving words at buswidth
size was the most efficient way. However,
that’s still what the MS C run-time does,
which on typical Pentium III and better
hardware comes nowhere near the theoretical
throughput of the memory subsystem.

If you don’t believe me, just code up a
simple test using QueryPerformanceCounter()
and memcpy() between the same blocks of
memory at 4k, 64k and 1024k sizes. It is
very illuminating. Makes you change your
assumptions about “trusting the libraries”
versus doing it yourself.

> The fact is, if you’re writing cutting-
> edge 3D, you have to code to the HW
> somewhat. There is no way around this.

Absolutely. And I want to be able to do it
on BeOS in addition to Windows. With my
Voodoo cards, I can. With my G400 I could
(before I replaced it). With my Radeon I
could (but it didn’t work under Windows).
With my GeForce I cannot.

If a databook for NV1x chips were to be
leaked, or properly provided under NDA, or
just some source change hands behind the
second oak from the west in golden gate park,
I could give you just the e-mail address to
talk to.

Oh, and please try that timing of memcpy()
in the VC++ standard C runtime library, and
compare it to what you could roll on your
own using Pentium-III SIMD. It’s worth it.

I can’t try the memcpy experiment right now, but I might be able to in January.

I can believe that rep movsd does not run at full speed. But I would think that, on a P3, the speeds you would get by using a partially-unrolled loop using (1) SSE, (2) MMX, and (3) plain old x86 are all roughly comparable (within 20%, at least).

I know that I’ve seen MMX and x86 achieving roughly similar speeds for memory fills (fill a block of memory with a single 32-bit constant). I haven’t ever tried them for copies.

It’s true that direct bus DWORD reads and writes are going to be slow. When you have a 64-bit FSB, writing a DWORD to uncached, non-write-combined memory is essentially guaranteed to be suboptimal, for example.

For cached memory, all transactions are a cache line at a time. Whether you read 32, 64, or 128 bits at a time, you still get 32 or 64 bytes (depending on the x86 in question). I haven’t thought through the write cases in full; in that case, you do have to worry in that you don’t want to bring in a cache line from memory if all you’re going to do is overwrite it all.

For write-combined memory, it is meaningless to talk about reads. For writes, so long as you write sequential aligned 32-bit or 64-bit words, you’re fine – the write combiner is your friend.

When I said writing somewhat to the HW, what I meant is that you can’t write generic OpenGL code, not that you have to write to the specific chip.

Trust me, you do NOT want a full reference manual for one of our chips. I wouldn’t wish that on anyone.

If you think it’s as simple as giving someone “the register specs”, nope, it isn’t.

Think The X-Files. You know those episodes where Mulder and Scully are talking to an informant, and the informant starts to hesitate, fearing that the information is just so sensitive and so dangerous that they reconsider and start telling Mulder and Scully that “these doors just aren’t meant to be opened?”

Well, our HW interfaces are just one of those doors that isn’t meant to be opened. There are far too many reasons of all sorts (ranging from legal constaints to technical constraints to incentive constraints to practical constraints to authority constraints) that it’s a topic on which I must “Just Say No.”

  • Matt

>Trust me, you do NOT want a full reference
>manual for one of our chips. I wouldn’t
>wish that on anyone.

I’m touched by your concern for my well-
being :slight_smile: However, if I got these, I would
just carry them (blind-folded) to the people
who did the i810 driver, Radeon driver, G400
driver, etc. I would be immune from the brain
warping radiation.

Me? I’m just a multimedia guy. That’s not to
say that sound chips aren’t getting
complicated these days. The SB Live has a
DSP which gives you 512 operations per
sample, for example. And some newer “cheap”
CPU cores come with programmable MPEG decode
blocks. But I digress.

A linkable ELF .o with a header file would
be quite acceptable, though. As long as it
used external references for things like
spinlocking, memory mapping, etc, because
the BeOS kernel programming model is… uh…
less archaic than Linux.

I know, I know, It’s wishful thinking. But it
is christmas after all. At least, I’d like
to thank you for showing you care about
users, even if in the end you can’t