Allocating a buffer of more than 2GB?

fred_em · June 23, 2021, 9:20am

Hi,
Is there any way to create a buffer of more than 2GB with OpenGL?
Given than most graphics cards usually have more than 4GB or even 8GB of memory, it seems weird to me that OpenGL reports a limit of 2GB for the following fields (using a recent nVidia card):

GL_MAX_TEXTURE_BUFFER_SIZE
GL_MAX_SHADER_STORAGE_BLOCK_SIZE

Isn’t there a way to create a very large, single, contiguous buffer?

Thanks,
Fred

Alfonse_Reinheart · June 23, 2021, 1:22pm

The size of a buffer allocation is not the same thing as the size of data that a shader can access. The former is the largest size you can give to glBufferData (ie: allocation size). The latter is governed by the limitation you’re talking about. You can (probably?) allocate more than 2GB in a buffer; you simply can’t glBindBufferRange all of it for SSBO usage.

Though its interesting to note that many Vulkan implementations allow you to use more than 2GB for SSBOs. I have no idea why.

Dark_Photon · June 23, 2021, 1:26pm

Good question. If there is, I don’t know about it.

An interesting, related question is which desktop GPUs support indexing buffers on the GPU with 64-bit offsets and addresses.

Related to NVIDIA and 64-bit shader access into buffers, it appears there’s a tie-in with the presence of NV_gpu_shader5 support and shader 64-bit integer. This appears to be supported back to the GTX4xx days at least (Fermi).

NV_shader_buffer_store - 3/2010:

(1) Does MAX_SHADER_BUFFER_ADDRESS_NV still apply?

  RESOLVED:  The primary reason for this limitation to exist was the lack
  of 64-bit integer support in shaders (see issue 15 of 
  NV_shader_buffer_load). Given that this extension is being released at 
  the same time as NV_gpu_shader5 which adds 64-bit integer support, it 
  is expected that this maximum address will match the maximum address
  supported by the GPU's address space, or will be equal to "~0ULL" 
  indicating that any GPU address returned by the GL will be usable in a
  shader.

NV_shader_buffer_load - 8/2009:

The range of GPU addresses supported by the [shader] LOAD instruction may be
subject to an implementation-dependent limit.  If any component fetched by
the LOAD instruction corresponds to memory with an address larger than the
value of MAX_SHADER_BUFFER_ADDRESS_NV, the value fetched for that
component will be undefined.

…

Get Value                   Type    Get Command                  Minimum Value   Sec     Attribute
---------                   ----    -----------                  -------------   ---     ---------
MAX_SHADER_BUFFER_ADDRESS_NV Z64+   GetIntegerui64vNV            0xFFFFFFFF      2.X.2   none

NV_vertex_buffer_unified_memory - 6/2009:

11) Does MAX_SHADER_BUFFER_ADDRESS_NV imply any restriction on vertex
attrib or element array addresses?

RESOLVED: No, it is only a restriction on shader loads. The entire
address space is available for vertex/element fetches.

…

15) Does this extension require 64bit register/operation support in 
    programs and shaders?

RESOLVED: NO. At the API level, GPU addresses are always 64bit values
and when they are stored in uniforms, attribs, parameters, etc. they
should always be stored at full precision. However, if programs and 
shaders don't support 64bit registers/operations via another 
programmability extension, then they will need to use only 32 bits.
On such implementations, the usable address space is therefore limited
to 4GB. Such a limit should be reflected in the value of 
MAX_SHADER_BUFFER_ADDRESS_NV.

It is expected that GLSL shaders will be compiled in such a way as to 
generate 64bit pointers on implementations that support it and 32bit
pointers on implementations that don't. So GLSL shaders written against
a 32bit implementation can be expected to be forward-compatible when 
run against a 64bit implementation. (u)intptr_t types are provided to 
ease this compatibility.

Built-in functions are provided to convert pointers to and from a pair
of integers. These can be used to pass pointers as two components of a
generic attrib, to construct a pointer from an RGUI32 texture fetch, 
or to write a pointer to a fragment shader output.

fred_em · June 23, 2021, 2:20pm

Thanks for your replies.

Dark Photon, using the old NV* extension functions (good find!) may work, but this impractical in my situation.

Alfonse, you identified something I haven’t thought about. There are three things:

the allocation size
the glBindBufferRange maximum size
the size accessible by shaders

While the allocation size can indeed go beyond 2GB (hmmm, need to double check that…), and while I may work around the glBindBufferRange limitation (easy enough, by calling the function multiple times), doesn’t GL_MAX_SHADER_STORAGE_BLOCK_SIZE limit the maximum size my shader will have access to? For this, I can’t do anything, can I?
What do you think?

Alfonse_Reinheart · June 23, 2021, 3:14pm

These are the same thing. When you call glBindBufferRange, you are saying “the shader can use this range of this memory”. You’re not allowed to call this with a size greater than what the shader can access.

No, you can’t.

If you call glBindBufferRange to the same index of the same binding target, you will be overwriting that index with a new buffer range. IF you use a different index, that means you have two SSBOs in the shader, since each index maps to a specific SSBO in your shader.

fred_em · June 23, 2021, 3:29pm

Ok, so in this case, assuming I somehow manage to allocate more than 2GB, how could I concretely map the buffer contents for updating? I don’t see, quite frankly.

Alfonse_Reinheart · June 23, 2021, 3:39pm

All at once from a single SSBO in your shader? You can’t.

You can bind any 2GB portion of the buffer’s storage to any (appropriate) SSBO binding point. But the size of the range for any binding call must be within the limitation you cited.

fred_em · June 23, 2021, 3:45pm

OK, great, thank you, that seems like a workable path. I will definitely try that. I won’t avoid having two binding points, but I can avoid having two different buffers.
Thanks again Alfonse and Dark Photon.

fred_em · August 25, 2021, 1:15pm

I have just noticed that Intel HD adapters have a limitation of 128MB for GL_MAX_SHADER_STORAGE_BUFFER_SIZE and a limitation of 16 binding points. That’s a maximum of 16*128MB=2GB worth of memory accessible by a shader at a single time.
There seems to be nothing that I can do to access more than 2GB of memory from a shader on Intel adapters, which is a shame.
I noticed the same limitation applies for Vulkan.
The only way to use more memory from GPU code is basically to use OpenCL (and forget about rasterization of course).

Dark_Photon · August 25, 2021, 11:49pm

Hmmm. I wonder…

From: https://opengl.gpuinfo.org/

fred_em · August 26, 2021, 7:47am

I meant, “There seems to be nothing that I can do to access more than 2GB of read/write memory from a shader on Intel adapters, which is a shame.”
GL_MAX_TEXTURE_BUFFER_SIZE is 134217728 on my Intel HD Graphics 520 adapter, meaning I can access only up to 512MB of read/write memory through texture buffers.

fred_em · August 26, 2021, 12:18pm

https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_shader_storage_buffer_object.txt

Here is what the spec says:

“The total amount of buffer object storage that can be accessed in any shader storage block is subject to an implementation-dependent limit. The maximum amount of available space, in basic machine units, can be queried by calling GetIntegerv with the constant MAX_SHADER_STORAGE_BLOCK_SIZE. If the amount of storage required for any shader storage block exceeds this limit, a program will fail to link”

Also of interest, this remark from user BDO on stackoverflow:

“Since you do not declare the shader storage block with a fixed size, I wouldn’t know how the glsl compiler should check for this”

Alfonse_Reinheart · August 26, 2021, 1:35pm

I’m not sure how that’s interesting. He’s just pointing out that the shader compiler cannot verify a limitation determined at runtime.

fred_em · August 26, 2021, 4:30pm

Ignoring GL_MAX_SHADER_STORAGE_BLOCK_SIZE, like the OP in the stackoverflow thread, I did some tests. Just to be precise, my SSBO buffer is just declared with a variable number of ivec4’s (i.e., { ivec4 data[]; } my_data)

on an GeForce GTX 1080, I can indeed allocate a 3GB buffer, mapBufferRange it, and read the buffer in my shader. ‘int’ indexing is okay through my_data.data[index][offset] and allows me to read the whole 3GB (that’s the advantage with ivec4’s, obviously, i.e. to access up to 8GB)
on an Intel HD Graphics 520, I tried to do the same with a 1GB buffer, and it works. It failed with an OpenGL out of memory error 1285 with a 2GB buffer when calling glTexImage2D, so that’s an unrelated error.

So this makes me think either that

we here do not understand the limitations and spec of values such as GL_MAX_SHADER_STORAGE_BLOCK_SIZE
the OpenGL spec is either unreadable or impossible to understand
both Intel and nVidia do not honor the limitations advertised by their implementation, which seems unlikely to me

Alfonse_Reinheart · August 26, 2021, 5:34pm

After seeing my answer to the question you linked to, I did some digging into the standard. I found “Table 6.5: Indexed buffer object limits and binding queries”. In the section of the table for SSBOs, it specifically says:

size restriction: none

As such, what you’re doing is perfectly valid. It seems that the size restriction is only for the static size of an SSBO’s block definition… in OpenGL.

It should be noted again that Vulkan does have a specific limitation on the buffer range you can use. So it seems clear that despite the OpenGL specification, there is a hardware limitation on the appropriate range.

If it’s at all possible, I would suggest doing something weird. Check what the Vulkan limitation is (just by querying info from the VkPhysicalDevice; no need to create a VkDevice) in your OpenGL program and adhere to that.

fred_em · August 27, 2021, 5:09am

I saw the discussion on github as well:

github.com/KhronosGroup/OpenGL-API

GL_MAX_SHADER_STORAGE_BLOCK_SIZE only limits the size of the block in GLSL.

opened 02:08AM - 12 Mar 18 UTC

closed 07:19AM - 01 Nov 19 UTC

NicolBolas

OpenGL OpenGL ES

`GL_MAX_SHADER_STORAGE_BLOCK_SIZE` is stated to be the limit that a storage bloc…k can be in a GLSL shader. However, because SSBOs can have variable size, it becomes possible to bind a range of a buffer that is larger than this size limitation. So the question is, what is supposed to happen to the memory that exceeds this size? Can a shader using a runtime sized array access memory outside of the `GL_MAX_SHADER_STORAGE_BLOCK_SIZE` range? Should it not be a `GL_INVALID_OPERATION` error to call `glBindBufferRange/Base` where the size/effective size is larger than `GL_MAX_SHADER_STORAGE_BLOCK_SIZE`?

I did a few more tests on my Intel laptop:

maxStorageBufferRange is 4294967295 on Vulkan (interestingly, maxTexelBufferElements remains at 134217728, but we don’t care here)
I tried to compile and link a program with a huge static SSBO array size (eg. { ivec4 data[100000000]; }, which is about 1.6GB of data) and it succeeds. glLinkProgram just takes an enormous time to complete, but does not complain at all. This means that the Intel implementation does not even look at GL_MAX_SHADER_STORAGE_BLOCK_SIZE… I wonder where this limit is looked at in their code…

Do you know what PDaniell means by “Accesses beyond this limit are covered under the out-of-bounds access behavior defined in the GL spec”?
How can I deterministically, programmatically check that buffer accesses are working from the shader? Shall I enable robust buffer access? Theoretically, if I enable robust buffer acces, the implementation must return 0 for data read operations beyond either specification or hardware limits, right?

fred_em · August 27, 2021, 7:26am

On a GTX 1080 with drivers 471.41, attempting to compile and link a fragment shader with:

{ ivec4 data[67108864]; } my_data;

causes a crash. The glLinkProgram call does return, but calling glGetError afterwards (or glGetProgram with GL_LINK_STATUS) results in a crash. I am sure no errors are reported before because I am doing a glGetError after every single GL instruction in my code.

67108864*16=1,073,741,824 (1GB of data). Doing this should work, because the nVidia driver reports GL_MAX_SHADER_STORAGE_BLOCK_SIZE=2,147,483,647.

fred_em · August 27, 2021, 8:00am

I just tested with robustness enabled in the GL context, and it does not change anything.

Dark_Photon · August 27, 2021, 2:02pm

fred_em:

On a GTX 1080 with drivers 471.41, attempting to compile and link a fragment shader with:
{ ivec4 data[67108864]; } my_data;
causes a crash.

The glLinkProgram call does return, but calling glGetError afterwards (or glGetProgram with GL_LINK_STATUS) results in a crash.

Similar behavior here on NVIDIA, though for me glLinkProgram() hangs for 28 seconds and then trips an Access Violation down in the driver.

I get the same behavior whether I try to shovel this high of an explicit array dimension (64M * 16 = 1GB) into a UBO or and SSBO (…ignoring whether it’s even valid to do so).

It does seem like the NVIDIA GLSL compiler should probably trip an error here rather than just hang+crash in glLinkProgram().

fred_em · August 28, 2021, 9:53am

I do all my reading from a giant QUAD primitive, processed through a glDrawElements(). I am just making sure I don’t go beyond the various limitations involved (GL_MAX_TEXTURE_SIZE, GL_MAX_VIEWPORT_DIMS, GL_MAX_RENDERBUFFER_SIZE and the likes).

Reading a 3GB SSBO from a shader seems to be no problem on nVidia hardware.

When reading a 5GB SSBO, all creation steps succeed (buffer allocation, shader compilation and linkage), glDrawElements is called and returns normally. The shader reads unexpected data beyond the 2GB limit (exactly). This makes little sense to me as reading a 3GB buffer, as I said, is fine from start to end.