Optimizing texture fetches with higher mip levels

Let’s say I have some shader program in OpenGL rendering a full screen quad. And in a fragment shader I sample some huge textures at random texture coordinates. That is one same texture coordinate for all texture samplings in one shader invocation, but it is various among different shader invocations. These fetch operations produce performance drop, I even think that due to the size of the textures the GPU texture cache is not big enough and is used not efficiently.

Now I have a theoretical question: can I optimize the performance by using some low-resolution like 32x32 mask textures, which are built by mipmapping the large textures, and if a value in a mask texture at given texture coordinate at some higher mip level is not appropriate, then I don’t need to perform texture fetches at full-size level 0? Something like this:

vec2 tc = calculateTexCoordinates();
bool performHeavyComputations = testValue(textureLod(largeMippedTextureSampler, tc, 5));

float result = 0;
if (performHeavyComputations)
{
    result += textureLod(largeMippedTextureSampler, tc, 5);
}

About 50% of texels at mip level 5 will not pass the test. And so a lot of shader invocations should not sample the full-size textures.

But I am introducing branching in the code. May this branching hurt the performance even worse than sampling the full-size texture even if that is not needed? Different GPUs may behave differently, some may not even support branching, will they perform two fetches instead of one?

I can test this code on some machines later, but my question is theoretical.

And can you suggest another optimizations, if this won’t work properly ?

Read up on branch divergence here: Shader#Execution_model_and_divergence.

Basically, you might not want to do this. In blocks of fragments where all of the fragments can skip the inner block, you can avoid the lookup and save some perf. But in blocks where some need this texture lookup and some don’t, AFAIK they’re pretty much all going to do it. So how clustered are the groups of fragments/texels that can skip the hi-res lookup?

Also before you even go down this road…
I’d first make sure that you are really texture lookup limited. With shader doing a single, normal texture lookup, replace your texture with a 32x32 texture. What perf do you see? If not much difference, then you don’t need to go down this path but need to look elsewhere for your primary bottleneck.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.