Architecture: Why is OpenGL state-based?


Having had a very superficial look into OpenGL as a developer in the recent past, I’m aware that it uses a state-based system to take in arguments/parameters and only afterward apply actual operations by implicitly using those pre-set arguments/parameters in the necessary function call.

Is there anyone here who knows what the underlying technical reasons for that decision were? I am guessing this is an answer that would have to come from someone who has worked on the Development Team at some point.

The reason I ask this is that I currently see a possible need for a state-based system in an application I am building, and would like to know if my reasoning is near the mark.

Please – no guesses or suggestions – I would like to hear only from those who are certain (or very close to it) of the hard historical facts.

Many thanks,


OpenGL is an architecture designed to be performant on the hardware of the day. It was build on the experience from IrisGL, which was not formally designed, but evolved with the hardware.

Afaik the state machine models closely the hardware at the time (and vice versa), e.g., a certain state would correspond to a certain register (or flag in the firmware).

I can think of two main reasons:

  1. Allows for easy backwards compatibility on the source level. You develop some new hardware that does X. Instead of adding a new render entry point that has an extra parameter for X, you simply add a state setting.

  2. Allows easy use of the API by being able to ignore things you do not need / care about.

The main disadvantages of a state based system:

  1. Threads can get tricky
  2. State leaks - 3rd party or other code sets some rendering state with out resetting it to the default.

If you pick up the 3.0 spec, and then remove everything that is marked deprecated, do you arrive at the same assessment ?

Restated - in a world where the programming model centers on shader programs and buffers full of data for them to operate on - there are a lot fewer state manipulations to be concerned with. (Still calls to adjust the rasterizer settings - depth write and blend etc).

Besides, EXT_direct_state_access adoption by the ARB “will” go a long way towards eliminating much of the cruftier state and bind-to-edit ugliness that currently plagues the API. A good bit of the API and associated state has already been deprecated in GL3 - DSA will likely provide the coup de grace.

Yes, but there are other things that are unrelated to the hardware. Texture objects was made available in 1.1
Why was this state based? Why do you need to bind it before creating a texture?
Of course, it’s been a long time we wanted this to be corrected.

Having worked on CG pipelines, and proprietary hardware since before the days of OpenGL it makes very clear sense that OpenGL retains a hangover from days when hardware was driven much more directly, and was far more limited.

In those environments State Machines are simply the way to tie the SW to the HW as closely as possible. At the very simplest level you basically have a function in the API which literally clears or sets bits in a register on the silicon which probably has a very similar name to the actual command you are using.

As someone else has already commented we are moving away from that now with newer GPUs, shaders, and OpenGL3.

At the driver level though things like ‘binding’ will still go on, as it may be necessary to page something in, or switch banks, fire off DMA etc.

Until only recently (prior to the advent of the iPhone etc.) the reliance on the old API would, IMO, make putting GL on portable devices easier, as a lot of those had mobile versions of hardware which were more similar to old hardware, than to our current multi-core beasties.

In some ways this move to pull us away from the silicon is a shame. I am sure i am opening a can of worms here, but anyone who says that modern compilers can out think what good low level programming can do is deluding themselves… I have just spent the last week thrashing the best optimizations my compilers can come up with when I hand code SSE myself.

I am sorely tempted to go do the same thing with my GPU next!

Why did you not do it?
GL_ARB_fragment/vertex_program are around since 2002 and did not magically stop to work with the arrival of Cg/GLSL.
Most of the time I don’t care that much about the last 5% performance and will be more than happy when the state uglyness goes away.

You are right, of course… except that vectorization by hand with SSE yields 50 - 100% speed increases over even a lot of supposedly auto-vectorised compiler code.

I stuck with GLSL up until now for several reasons:

a) It’s a very good base to learn from. I was not familiar with OpenGL at the time.
b) I was told that GL_ARB_fragment/vertex_program is essentially crippled, and not a real machine language. Perhaps I have been misled.
c) I was also told (probably by the same people that think C compilers are the center of the Universe) that GLSL is so heavily optimized that it’s not really worth trying to beat them…

Now, where have I heard c) before? :slight_smile:
But to be honest I gave that comment some credibility because of b).

Rest assured I will be experimenting.

For my part, I believe that languages like C are not optimal as a base language for parallel computers. I still think that a “declarative” language, where data and the code that produces this data are equivalent (aka. SSA-Form) provides a more convenient base to optimize from. Basically, a “good” language for parallel processing shouldn’t allow any side effects. Of course, this are just speculations as I haven’t done any formal research on the topic: just my experience with different programming languages gives me a strong intuition in this direction.

This I did not know. I tried beating my pascal compiler (don’t really like C/C++; it’s good enough for my work but for my private projects I’d rather use a language I like) by creating the SSE code myself and failed miserably.

Concerning c):
I heard something very different. Sometimes just moving your code around may increase/decrease performance. AFAIK there are people/companies using GLSL as a prototyping language and then they are going the ‘machine language way’ for their release. But you should take this with a grain of salt.

My experience:

I am using Cg to generate vertex and fragment programs.
I must say NVIDIA did great job. The code is exceptionally well optimized. It is hard to find a place to modify that results in performance gain in ShaderPerf tool.

You can maybe lower the number of instructions but in the end the performance reported by NVShaderPerf tool is the same.

This I did not know. I tried beating my pascal compiler (don’t really like C/C++; it’s good enough for my work but for my private projects I’d rather use a language I like) by creating the SSE code myself and failed miserably.[/QUOTE]

Alignment plays a part, although I have not seen particularly massive increases between using aligned and unaligned x86 commands.

There are also times when standard (non-packed) commands work out quicker. gcc will compile four successive float adds as just that, and even inline aligned blocks of asm and data using vector adds will not beat it on my Core Duo. HADDPS is another command that can be beaten with four instructions of unpacked shift and adds.

But even with a function call overhead doing vector Floors, or vector Cubics and so on you can beat an inlined c function by a factor of 3 I have found.

You do need to benchmark, and yes offsets to data, page switches etc. can all affect it. So it’s something to do once a library or project is nearing completion so you can splice things together neatly.

The golden rule is to get data into the CPU as quickly and efficiently as possible, and then do lots of internal instructions, then get it back out as quickly and efficiently as possible. Although even then there are strange pitfalls. I have found that moving 4 floats on occasion can be quicker than moving one float out of a 4 wide register! Go figure!

Then when you think you have it all under control you learn about register and memory clobbering!

Any routine I write I benchmark against my best C++ over billions of iterations, and try both inline and function versions, and also change it’s location and context within the benchmark app. But for a 50% increase in the speed of your math library it is certainly worth it.

Bears out what I was saying above.
Again, I will look at asm for the GPU once my shaders are complete. I would not want to prototype in asm there. c is much easier for that.