Does anyone here think this might have a performance benefit? I noticed that we have 32 texture slots to play with. If I wrote a texture manager to use all 32 slots (leaving recently used textures in inactive slots to reduce texture change overhead) and then changed the shader’s texture slot IDs to match, the extra slots could be used to keep textures from being needlessly brought in and put away almost like Carmack’s Megatexture. So if I used textures A, B, C, then A, then B, the engine would reuse the least fresh texture slots to cache A, B and C but when it went back to A and B it wouldn’t need to switch back to them; it’d just change the texture slot IDs exposed to the shader thus making it return to the previously loaded slots.
Would this have a performance benefit or would there be a downside to holding so many textures in cache? When it comes down to it, it’s intended to produce a similar outcome to Megatexture whilst keeping the images separate.
First of all, before you do such ‘optimizations’ please make sure that switching textures is actually the performance bottleneck in your engine. I can tell you, that today, you can do more than 2000 (non-redundant!) texture switches in a single frame and still have frame rates above 100fps.
Second, you seem to assume that binding a texture to a texture unit has anything to do with “bringing a texture into the cache”. Certainly not.
and then changed the shader’s texture slot IDs to match, the extra slots could be used to keep textures from being needlessly brought in and put away
You seem to think that it is faster to use glUniform() in order to change the sampler uniform instead of using glActiveTexture()/glBindTexture() in order to connect the shader to a certain texture object.
I don’t think, your approach will be any faster - the opposite, it might be a lot slower, since it is a code path that won’t be optimized much by the driver.
When it comes down to it, it’s intended to produce a similar outcome to Megatexture whilst keeping the images separate.
Megatexturing is the exact opposite of this approach. It is intended to fake a single huge texture that can be alot bigger than the available GPU memory.
You approach would suffer from the same problems any tiled approch had: memory limitations, geometry restrictions (must be splitted at tiles) and special handling at tile borders.
The way texture cache work is only by accessing texture data. Every texels fetched are saved into the GPU cache to hopefully be reused. What specific with texture cache rather that any cache it that the texture memory is in someway twiddle used the z-order http://en.wikipedia.org/wiki/Z-order_(curve) or as least group by block of pixel, like the 4x4 pixel blocks of DXT format which increase the cache reuse and allow to fetch data big enough.
A GPU (and actually CPU as well) can’t fetch 32 bits in the middle of the texture just like that. It fetches a minimum length of data. If for example this minimum length is 64 bits if we only want 32 bits we are going to actually fetch from memory 64 bits. Hence it’s necessary to make the best of cache reuse to limit bandwidth wastes.
I wouldn’t rely on having 32 texture slots either. OpenGL has defines for up to 32, but that doesn’t mean that the full 32 can be guaranteed to be available (and even if so there may be restrictions, like you can do 32 texture lookups but only have 8 unique textures (i.e. each texture can have up to 4 lookups)).
I wouldn’t rely on having 32 texture slots either.
Why not? The OpenGL 3.2 specification states that the minimum number of texture image units in each shader stage is 16. That means you can have 16 samplers in your vertex shader, 16 samplers in your geometry shader, and 16 samplers in your fragment shader, for a combined total of 48. This is the minimum that a 3.2 compliant implementation will provide.
4.0 increases this to 80, due to the addition of 2 new shader stages, each providing a minimum of 16 image units.
Unless you’re using a lower version of OpenGL, you can assume that you have at least this much room for textures.
Granted, I seriously doubt that the OP’s plan will improve performance.
(and even if so there may be restrictions, like you can do 32 texture lookups but only have 8 unique textures (i.e. each texture can have up to 4 lookups)).
Note that these limitations only exist in pre-GL 3.0 hardware. And even then, only in ATI versions of pre-GL 3.0 hardware. Nowadays, you can reasonably expect to do texture sampling as much as you want.