Direct State Access

I am inspecting Direct State Access, which sounds pretty exciting. Now I am wondering if I am actually getting it right:

Let’s take the example of GlDrawElements. It would normally require to bind a vertex vbo and an index vbo. The cost for doing so is noticeable, so a decent engine tries to keep stuff sorted by vertex/index buffer usage (not to mention textures etc.).

Is this cost (and consequently the need for the sorting or queueing infrastructure) going more or less away with Direct State Access?

It Would be really reall great not having to be so worried about how many times textures and vertex buffer objects are shuffled… if not for a matter of speed, for the significative simplification of code.

Erm, no you misunderstood the intention of this extension completely.

DSA doesn’t have anything to do with performance. Using DSA won’t speed up your app, so you will need to keep sorting stuff. That doesn’t change.

It only makes the API easier, less painful to use. That’s it. Just more convenience.

Ok, in SOME rare situations DSA does give a speedup, but only in situations where non-DSA code would have needed to query lots of states, change state, and then reset the queried state. This is supposed to be done inside nVidia’s Cg. But this speedup is only on the driver-side (CPU), not on the GPU.

Sorting of states needs to be done to keep the GPU busy and not introduce pipeline-stalls. DSA can’t change that.

Also keep in mind that the DSA-extension is still experimental and not finished. If you start using it now, you might need to change your code later, when it becomes standardized. Also, i don’t know whether it is already supported on ATI.


Exciting: no more.

The way I understand it now, I am sure Direct State Access will clean up things, and represent some sort of evolution - but certainly not a revolution.

I wonder why no one seems to address the fact that the whole “object/state change = cost” is a major flaw of both OpenGL and DirectX. Is this directly related to how the hardware works? Then it’s a hardware flaw.

Back in the software rendering days, using a pointer to a certain memory area containing vertex data, wasn’t a cost at all, it was barely a value assignation to variable. Passing said memory area through a chain of functions or another - to render it with a certain “shader” or another, wasn’t a cost too, just a case switch or something like that.

But I digress! :slight_smile:
Thanks for explaining.

It is a hardware restriction. It is the reason why hardware can be that fast in the first place. You wouldn’t expect a Ferrari to go top-speed while driving off-road, do you?

Learn more about how hardware works and you will understand it.

You really can’t compare hardware-rendering with software-rendering. Neither feature-wise nor performance-wise.

But with the flexibility of modern shader-pipelines more and more of these restrictions are lifted, or become easier to overcome. Maybe in a few years it will actually be possible to render most things without sorting.


I wonder why no one seems to address the fact that the whole “object/state change = cost” is a major flaw of both OpenGL and DirectX. Is this directly related to how the hardware works? Then it’s a hardware flaw.

It’s not a flaw; it’s a tradeoff. It’s like saying it’s a flaw that some algorithms can be sped up by making them use more memory.

First, OpenGL and Direct3D are designed to abstract implementation details. That way, you don’t have to know what specific registers or marshaling behavior ATI or NVIDIA hardware has or needs. It just functions. Abstractions impose a performance penalty, but the penalty is usually minimal (and you’d have to incur it in your own code anyway even if you were coding directly to the hardware).

Second, GPU memory and registers don’t belong to you. You must access them over a (relatively slow) PCIe bus. The APIs are designed to minimize the impact of this as much as possible, but there will always be some performance simply due to having less available bandwidth to the GPU.

The thing most likely to reduce state change overhead is on-CPU GPUs, like Intel’s stuff or AMD Fusion.