Sprite atlas, instancing and uniform buffer size

I want to implement textures in my multi-instanced rendering. According to some posts here (sorry, the side doesn’t allow me to paste links) I decided to chose approach with with atlases. I already implemented them in my C++ code and they works fine, but I faced problem with shader.
This is how my shader looked before

#version 450

uniform mat4 MVPs[256];
attribute vec3 vCol;
attribute vec2 vUV;
attribute vec3 vPos;
varying vec3 color;

void main()
    gl_Position = MVPs[gl_InstanceID] * vec4(vPos, 1.0);
    color = vCol;

And how it looks now

#version 450

layout(location=000) uniform mat4 MVPs[100];         // 16
layout(location=100) uniform int atlasIDs[100];      // 1
layout(location=200) uniform vec2 atlasScales[100];  // 2
layout(location=300) uniform vec2 atlasOffsets[100]; // 2

in vec3 vCol;
in vec2 vUV;
in vec3 vPos;

out vec3 color;
out vec2 uv;

void main()
    gl_Position = MVPs[gl_InstanceID] * vec4(vPos, 1.0);
    color = vCol;
    uv = vUV * atlasScales[atlasIDs[gl_InstanceID]] + atlasOffsets[atlasIDs[gl_InstanceID]];

atlasScales and atlasOffsets represents how current instance UV should be transformet to fin into atlas entry, atlasIDs represents atlas entry id for each instance
As you can see, I had to decrease amount of MVPs (available instances to draw). I also had to greatly decrease my atlas size from, as I planned, ±1024 (just in case, I’m not sure) to 100. If I increase size for one of these arrays for about 100, I get into situation when my game render nothing at all (black screen), which leads me to conclusion that it’s just data size overflow.

This is how my renderer system works

* Game
* |
* +--basic_shader <- shader
* |  |
* |  +--player_model <- mesh
* |  |  |
* |  |  +--player_1 <- instance
* |  |  |
* |  |  +--player_2
* |  |
* |  +--wolf_model
* |     |
* |     +--wolf_1
* |     |
* |     +--wolf_2
* |     |
* |     +--wolf_3
* |
* +--transparent_shader
*    |
*    +--bottle_model
*       |
*       +--bottle_1

and I’m also not sure where to inject the textures here.
Am I going in a correct way?

I’m thinking about to split my Mesh-to-model map into some sort of instance packs. So if I have instance limit n and I want to render more than n instances, I simply put them on different render pass

First: regarding uniform storage limits, you can merge atlasScales and atlasOffset into an array of vec4s, which requires the same storage as one array of vec2s or half the storage of 2 arrays of vec2s. But as you’re requiring 4.50, you can use SSBOs, which have no limit other than available memory. This also allows you to pack the atlasIDs array, requiring only 4 bytes per entry rather than 16 (a uniform array will always expand each array element to the size of a vec4).

If you can have as many distinct atlases as you have instances, there doesn’t seem to be much point in having the atlasIDs array. You may as well just index atlasScales and atlasOffsets with the instance ID.

Regarding texture indexing: bear in mind that an array of samplers can only be indexed with a dynamically-uniform expression, and even per-instance values aren’t guaranteed to by dynamically uniform. Array textures don’t have this constraint.

1 Like

WOW, thanks! You gave me a lot of new information!

I’m guess that in my case I need atlases, because I’m going to create something like procedural terrain in minecraft, where same textures re-used many and many times. The only thing I’m not sure about is to generate some plane shapes for each tile side and then populate them with instancing or to generate unique mesh for specific chunk each time it updates. I’m afraid that second solution may be slow for complex chunks, but first one need to store many matrices or, at least, vec3s if will use specific shader for them.

Yes, you right. SSBO is what I need and I’m definitely going to implement this in future.

How old where those posts you refer to?

In the olden days (13+ years ago), a texture atlas is all you had.

However, nowadays 2D texture arrays or bindless textures are a much better way to go for something like this. Both get rid of the derived need for atlas scales and offsets when batching multiple instances or sub-draws within a single draw call.

With 2D texture arrays, you just index into an array of 2D images. With bindless texture, you index into an array of bindless texture handles. No texcoord (s,t) scaling and shifting required. And wrap modes and minification sampling work just as they would for a normal 2D texture. This is in contrast to texture atlases, where you don’t get this behavior.

This is spam prevention for new users. Keep using the site and you’ll be able to post links soon.

For now, just put a few spaces in the URL or remove the http:// so that it’ll post, and we’ll fix it up for you.

1 Like

The post, remove the extra space character -

But aren’t texture arrays unsupported on some platforms? I’m not completely sure, if I want to port my game on other platforms in future, I don’t want to be blocked by this.

After a few minutes of thinking
Maybe I can rewrite my renderer for these platforms to use atlas, ok. But how big can be my sampler array (uniform sampler2D samplers[NUM];, I guess)? Can I store them all inside SSBO?

Wiki (Shader_Storage_Buffer_Object#OpenGL_usage) says how to setup it, but for me it wasn’t completely clear ho to access that data from inside shader (I’m noob, may be it obvious for everyone else), but if someone want to know:
wiki example

layout(std430, binding = 3) buffer layoutName
    int data_SSBO[];

Change to

layout(std430, binding = 3) buffer layoutName
    int data_SSBO[];
} ssboData;

and access it like ssboData.data_SSBO[123].

They’re supported on OpenGL 3.0 and later, or with the EXT_texture_array extension.

Also, note that in OpenGL 3.x, arrays of samplers can only be accessed with a constant expression. This restriction was relaxed to allow indexing with a dynamically-uniform expression in OpenGL 4.0.

IOW, array textures are more portable than arrays of samplers. The main downside of array textures is that the individual layers must be homogeneous; the dimensions, format, and filter/wrap modes apply to the texture as a whole, and thus apply to all layers. But if they’re being applied to instances, that’s unlikely to be a problem.

You can’t store sampler types in an SSBO or a UBO, only in default-block uniform variables. Also, you can only refer to as many textures as there are texture units. If you want to avoid those restrictions, there’s the ARB_bindless_texture extension (which hasn’t been incorporated into any core version).

1 Like

GClements answered this, but just to clarify since it sounds like you might be confused: “Texture Arrays” and “Arrays of Samplers” are two totally different things. I’m referring to Texture Arrays. (You can play with Arrays of Samplers if you want, but in my experience, these are a dead-end in OpenGL. Very limiting.)

And what’s the minimum OpenGL version you are designing your game for? If it’s OpenGL 3.0 or higher, then you’re guaranteed have texture arrays.

Re “how big” and texture arrays, I believe in OpenGL 3.0+ you’re guaranteed at least 256 texture layers, and in OpenGL 4.5+, you get at least 2048 texture layers. Check support for specific GPUs and drivers here:

And if you need even more flexibility, Bindless Textures are worth considering, unless you need to support truly ancient GPUs and drivers.

With Bindless Textures, you effectively can. It works very well.

1 Like

Yes, at first I thought that sampler arrays and texture arrays are same things :slight_smile:
Thank you all. I’m going to check out all these features, but a bit later today

Thank you a lot. I just finished with bindless textures and it seems to be a lot easier than atlas. I think, I still will use atlas maps for such things as sprite font, but for models I’m going to use bindless textures since it feels much more runtime-flexible. No need to extend atlas if new item require it, unbind and bind whole atlas texture again. Also, with atlas it unable to correctly scroll UV, what may be an important sometimes, actually.
This is what I’ve got, all objects united in a single vertex buffer and each object have uint64 texture handle inside it, which I use in render tick.

The wiki says ho to implement them in code, but if summarize…

glGenTextures(1, &gl_texture_binding_);
glBindTexture(GL_TEXTURE_2D, gl_texture_binding_);
glBindTexture(GL_TEXTURE_2D, 0);

handle_arb_ = glGetTextureHandleARB(gl_texture_binding_);

And cleanup…

handle_arb_ = 0;
glDeleteTextures(1, &gl_texture_binding_);
gl_texture_binding_ = 0;

These are my shaders

#version 450

layout(location=0) uniform mat4 MVPs[230];

in vec3 vCol;
in vec2 vUV;
in vec3 vPos;

out uint InstanceID;
out vec3 color;
out vec2 uv;

void main()
    InstanceID = gl_InstanceID;
    gl_Position = MVPs[gl_InstanceID] * vec4(vPos, 1.0);
    color = vCol;
    uv = vUV;


#version 450
#extension GL_ARB_bindless_texture : require

layout(bindless_sampler, location=230) uniform sampler2D bindless[230];

flat in uint InstanceID;
in vec3 color;
in vec2 uv;

void main()
    gl_FragColor = texture(bindless[InstanceID], uv) * vec4(color, 1.0);

It seems that having more than 1 instances causing a problem. I tried to duplicate that hexagon object and attach different texture to it and it just have no texture at all. It exists on scene and overlap other objects, but it is completely black. If I comment out my first hexagon spawn then second becomes fully visible. Are there some tricks with uniform bindless handle binding? I did some tests and it seems that only sampler at index 0 works fine
Problem is on picture

My C++ code for instances of mesh

const uint instance_count = std::min(mesh_objects.value.length(), render_list::objects_count_limit);
glm::mat4* mvp_array = new glm::mat4[instance_count];
//int* atlas_entry_id_array = new int[instance_count];
uint64* texture_handle_array = new uint64[instance_count];
    uint i = 0;
    for (const auto& object : mesh_objects.value)
        mvp_array[i] = view_projection_matrix * get_model_matrix(object);
        //atlas_entry_id_array[i] = 5;
        texture_handle_array[i] = object->get_texture()->get_handle_arb();
        if (++i == instance_count) break;
glUniformMatrix4fv(0, instance_count, GL_FALSE, reinterpret_cast<float*>(mvp_array));
//glUniform1iv(230, instance_count, atlas_entry_id_array);
glUniformHandleui64vARB(230, instance_count, texture_handle_array);
delete[] mvp_array;
//delete[] atlas_entry_id_array;a
delete[] texture_handle_array;
glDrawArraysInstanced(GL_TRIANGLES, mesh_objects.value.vertex_buffer_offset, mesh_objects.value.size_in_vertex_buffer, instance_count);

Something tells me that it somehow related to texture arrays and sampler arrays… I think, I have to dig in this direction :smiley:

Which GPU and driver?

To make sure there’s nothing wrong with your 2nd texture, I’d do this:

  1. Pass in 2 texture handles
  2. Do an instanced draw with 1 instance
  3. Lookup the bindless handle using gl_InstanceID+1 instead of gl_InstanceID.

If that works, your second texture is fine (and the problem is likely the following).

The one key thing to keep in mind about bindless texture is that the sampler expression needs to be dynamically uniform (except on NVIDIA, where you can pretty much do whatever you want). See the warning in this short section on the Bindless Texture wiki page:

Daisy-chain over and read about Dynamically Uniform Expressions, and note what it says about gl_DrawID and gl_InstanceID.

Bottom line, you can do what you’re doing now on NVIDIA GPUs/drivers and there’ll be no problem – assuming no bugs in your code. But on other GPU vendors, to follow the dynamically uniform requirement, you may instead want to use a multi-draw call (with 1 subdraw per instance) and index into your texture handle array using gl_DrawID. For instance:

The advantage of the first is you can continue to use the same, single vertex attr VBO for all of your “instances” (sub-draws in the multi-draw), just like with glDrawArraysInstanced().

NVIDIA GeForce RTX 2070, driver 465.89 from 03.30.21
But thanks for tip, I’m not targeting only NVIDIA users

Thx, I’ll try it now
How I tested - in my info collecting array I passed all texture handles at 0 index and rendered only 1 instance

Nope, everything becoming black at all. I tried to change handle assigning during render to

texture_handle_array[i] = mesh_objects.value[mesh_objects.value.length() - 1 - i]->get_texture()->get_handle_arb();

(take texture handle of opposite one, have no effect on one-instanced) and texture of second was successfully applied to first tile, while that second instance still was black. So, I guess, second texture itself is fine.

Tried to change spawn order. Now previously-second being rendered successfully with corresponding texture, while previously-first have this problem.

I guess, it happens with every instance that is >1. Also it happens only inside single draw call. Breaking up 5 instances into packs of 2 makes only each 2 to be black.

I’m going to try out glMultiDrawArraysIndirect later.

GL_NV_vertex_attrib_integer_64bit is unsupported for me, while this paragraph of wiki pays attention to this. This may be a reason, I hope…

All this time I just used my integrated "Intel(R) UHD Graphics 630"

If someone face such troubles, check your glGetString(GL_RENDERER) output. I’ll put it’s output at the window title, so players will be able to know which GPU they are using.

Works like a charm :sparkles:
Bindless textures, all 5 instances in a single draw call

However, I need to implement some support for such “GPUs”. For example, I still can use bindless textures but draw only 1 instance per DC (it worked for me, but it doesn’t seem to be a good idea)


Right. glMultiDrawArraysIndirect() or glMultiDrawElementsIndirect() should fix that right up. Just put one instance per sub-draw record, and lookup your bindless texture handle using gl_DrawID.

A related note that you probably won’t care about for a long while…:

While instanced draw calls or MDI (MultiDrawIndirect) draw calls can be very efficient and save a lot of needless CPU time dispatching draw work, using them with trivial instances (few vertices per instance or sub-draw) probably won’t max out your GPU’s vertex throughput. There are several options to achieve higher vertex throughput, but they’re more complex -to- considerably more complex to implement than simple instancing or MDI. You almost certainly don’t need this right now. In case you (or someone reading this thread later) cares, here’s a thread on that…

1 Like

Ok, thank you :pray:
I guess, I’m done with instanced texturing for now. Next I have to implement index buffer, or delay it and just start implementing physics, but I think that this thread has come to its logical end.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.