Official Bindless Graphics feedback thread

NVIDIA just released a new and faster way of rendering with OpenGL, called Bindless Graphics. Bindless Graphics are changes to OpenGL that can enable close to an order of magnitude improvement in the CPU-limitedness of graphics applications. Recent improvements in programmability have focused on additional flexibility in shaders (expanding formats to include more float and integer types, better branching support, etc.) and enabling new features (geometry programs, transform feedback, etc), and while some of these allow offloading parts of certain workloads to the GPU, they don’t directly attack the issues that dominate CPU time.

Bindless Graphics is enabled through two OpenGL extensions. NV_shader_buffer_load and NV_vertex_buffer_unified_memory

You can read the Bindless graphics tutorial for more detail.

Let us know how it works for you!

(with my NVIDIA hat on)

Just a comment: A lot of updates recently from NVIDIA, nice to see all the updates and browse developer page (like few years back).

Reading the spec right now, seems nice. Presentation is well made. Currently our application is more CPU + State changes bound than GPU raw power limited, but I aim (hopefully) to test these new extension next week and see If it make a difference.


How does this extension interact with Map/Unmap usage patterns? I use map/unmap so I can update the VBO data in another thread. Will it still work?

Is 8800 support planned? (All beta drivers I can find are for 9200+ only.)

Are these extensions GL 3 only? In the spec, there are examples using GLSL version 1.20, but the header states that the spec is written against OpenGL 3.0.

I was hoping that NV_shader_buffer_load would have been an extension which allows saving compiled shaders to disk and reloading them later!

I haven’t tried it yet, but it’s nice to see that binding blocks of uniforms will now be as fast as changing a pointer.

From the NVIDIA release: “Bindless Graphics is available starting with NVIDIA Release 185 drivers for hardware G80 and up.”

That’s what happens when you don’t eat your veggies. :o
Sorry bout that.

Blindless Graphics aka (like) one of the good bits from LP.

Nice one NV, good to see that some of LP has survived in some form.

Just a shame that AMD wont get in on the act (I say this as an AMD GPU user, so no bias against them); because of that it would be intresting to see how much use this would get in the ‘real’ world.

Very nice! Great job, NVidia!

That’s really nice !
But one thing I wonder is, since we are now able to get direct GPU global memory addresses, how does this interact with OpenGL SLI mode ? Is automatic scaling with multiple GPU still working ?

What about VAO with this? Oo

I spent post and post to debate with Korval a while ago about “VAO don’t allow fast buffer” switch so I’m glade to see such feature but I’m a little be confused VAO and this now. It even deprecate the new uniform buffer … Oo

I definitely need more information about all this, I don’t actually know where to go now.

well it’s only G80 cards, so you’ll have two very different code paths.
This stuff has actually got me excited about GL again. Thanks nvidia.

Wow, that’s daring stuff. Basically cuts open big holes through the thick mud that the current OpenGL API is :slight_smile:

I like it. If only it was universal.

All I can say is about time! As usual Nvidia leading the pack for OpenGL. This is why I buy Nvidia only due to their support for OpenGL vs. ATI. Just wish ATI would pick up the slack and get in gear… Here’s to wishing!!!

> How does this extension interact with Map/Unmap usage patterns? I use map/unmap so I can update the VBO data in another thread. Will it still work?

Yes, MapBuffer (and MapBufferRange) will still work, but please read issues 7-9 of NV_shader_buffer_load for more info.

> Is automatic scaling with multiple GPU still working ?

Yes, SLI can still work.

> What about VAO with this?

The new vertex attrib state is contained in the VAO (see the “New State” section), so these can be used in conjunction with VAO. However, it’s not clear that VAO will provide additional benefit since switching vertex attrib addresses should be cheap (Groovounet, some of your posts about VAO have been very insightful).

> Are these extensions GL 3 only?

They do not require a GL3 context, and should work with all the ARB_compatibility features.

Thanks for your answer.
Do you know if it’s currently working with NVIDIA R185 drivers ? If so, does it work in heterogeneous SLI configurations, with GPU not having the same amount of memory for instance ? Any idea of how it is implemented ? Does each GPU clone the same address space ?

Ok, I had a bit time to read/think through the specs and I think, I might have grasped it :slight_smile:

First of all, I think its a step in the right direction. It offers new features, even D3D10 has not. Issuing draw commands will get faster, leaving more CPU time for the appliaction itself. Although the GPU will not render faster per-se, there are now more possibilities for complex datastructures used in the shaders.

I have a few points to criticise, though. First of all the naming of the extensions is… just awkward.
Why not use “NV_shader_memory_access” (its about shaders directly accessing memory, right?) and “NV_buffer_direct_memory” (I don’t get the “unified” part…)

I think it might be good to not provide the non-Named Functions. We all know that bind-to-edit will die out some day, so why provide these obsolete semantics for brand new functionality?! Just provide the Named-Function semantics (but please, without the “Named” prefix).
Also, the specs refer to some functions of EXT_direct_state access without mentioning dependencies on it.

Now we get pointers and pointer arithmetic in shaders. That allows to create really complex data structures, like lists, trees, hashmaps - the specs even talk about whole scenegraphs. But how are we supposed to debug such monsters?

Is the driver using the “sizeiptr length” parameter of BufferAddressRangeNV to determine that a certain region of memory is in use by drawing commands? If not, do we have to use fences for that (welcome back, NV_vertex_array_range)?

It seems, that once a VBO is made resident, it never can become non-resident (unless the application tells it to be). How can GL make that guarantee? What is the use of a non-resident buffer then? Does a resident buffer have any performance penalties?

I like the separation of vertex attribute format, enable and pointer. VAO did not offer this and therefore was useless (for me). You could it one step further and provide “vertex format” objects which encapsulate the format and enables. I don’t know if that would be a great benefit, though (switching a stream setup with one command).

I’d like to suggest to move all the buffer-object related functions (MakeBufferResidentNV, IsBufferResidentNV, MakeBufferNonResidentNV) from NV_shader_buffer_load into NV_vertex_buffer_unified_memory. They feel more natural there. Additionally, IsBufferResidentNV may be replaced or accompanied by a new parameter to glGetBufferParameteriv().

Last but not least, IMHO the use of these new extensions probably has a fair impact on the design of a renderer. It cannot be used optional easily. Therefore, I will probably hesitate to actually make use of these features, if they are not available on ATI as well.

my 2c :slight_smile:

> the “sizeiptr length” parameter

D3D10 has some guarantees that accessing beyond the end of a buffer will not crash, this is providing something similar. If that’s not useful to you, you can use INT_MAX and ignore it.

> … once a VBO is made resident…

Issue 6 of shader_buffer_load discusses the purpose of residency. It shouldn’t adversely affect performance most of the time.

> probably has a fair impact on the design of a renderer.

The presentation on the developer site has some examples of how to port. The vertex buffer extension should be easy to maintain both codepaths, although I can appreciate that maintaining multiple versions of shaders has some cost to developers.

So, will there be new texture formats to render pointers into FBOs ? Could be useful for deferred renderers to “render materials” for later lookup.

I guess i could also store all materials in a big array and use an integer from an FBO for lookup, should be flexible enough.

I find the extension very intriguing, probably the most interesting and powerful idea since the introduction of shaders. However, debugging is indeed a problem. With standard GL/GLSL a broken shader usually crashes a program. With this i fear the blue-screen might become much more common. And debugging standard shaders can be a nightmare today already.


So it’s basically adding ‘GPU address pointers’.

  1. Handles now become direct GPU pointers on client side.
  2. Shading language now has C-like pointer syntax.

It’s a very nice feature but I really would have
preferred that the wording be more direct (i.e.
new feature: pointers to GPU memory!) so it is
easier to read, understand and use! :slight_smile: