glGenerateTextureMipmap Optimization

I need some advice on how to optimize the mipmapping process of a 3D texture. Currently, I am calling glGenerateTextureMipmap once on the texture (when I first encounter it) to create the necessary mipmap levels, then each frame I use a series of compute dispatches to pass along changes made to the base level. Here is the Java code that creates and maintains the mipmap levels using a 3D engine’s compute shader interface (hopefully it is clear enough, still):

// generate mipmap levels
int voxId = voxelMap.getImage().getId();
if (!mipsGenerated.contains(voxId)) {
    System.out.println("generate mips");
    GL45.glGenerateTextureMipmap(voxId);
    // ensure we don't attempt to generate mipmaps
    // for this image again
    mipsGenerated.add(voxId);
}

// create a view of the voxel map to write to
TextureImage target = new TextureImage(voxelMap, TextureImage.Access.WriteOnly);
// pass the voxel map to the compute shader
mipmapper.set("VoxelMap", ArgType.Texture, voxelMap);
// pass the target voxel map view to the compute shader
mipmapper.set("TargetLevel", ArgType.Image, target);
WorkSize work = new WorkSize();
for (int i = 0; n >= 2; i++) {
    // set the mipmap level to write to
    target.setLevel(i + 1);
    // set the mipmap level to read from
    mipmapper.set("SourceLevel", ArgType.Int, i);
    // execute the mipmap compute shader with
    // NxNxN work groups of size 1x1x1
    mipmapper.execute(work.set((n = n >> 1), 1));
}

The problem with this approach is that glGenerateTextureMipmap is somehow eating up ~30ms on the CPU each frame when using a 256^3 texture, even though I’m not calling it every frame. If I don’t call it at all, I get those 30ms back. And also, the compute chain I’m using every frame doesn’t have much performance impact at all (i.e. skipping them doesn’t return any significant performance).

I realize 256^3 may be an unreasonably large texture, but I am concerned about how significant a bottleneck this operation is turning out to be.

:question: Which GPU and driver is this on?

:question: Also, what’s the GL internal format of this 3D texture?

As a datapoint, I have seen glGenerateTextureMipmap() add ~0.1 msec per MIP level to total frame time for small textures (**). Though this is for 2D textures on high-end NVIDIA GPUs. I would expect cost might be quite a bit greater for 3D textures, large textures, and/or on low-end systems.

That said, based on my limited experience here, one thing you might try is limiting the number of MIP levels you let the driver generate with the glGenerateTextureMipmap() call, by setting GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MAX_LEVEL. See if that reduces the total time. I’m also curious to see how your time scales with the number of MIP levels you let it generate. For my case with small 2D textures, it was roughly linear.

With this, it’s always worth considering that the bottleneck might not not be the native GL driver or your GPU. It could be the mapping of your Java to the underlying bottom-layer graphics code. For instance, you’re not rendering this on a GL translation layer like ANGLE are you?

Also, you mention:

It could be that this or even your system’s graphics driver might be generating the texture MIPs on the CPU instead of the GPU (which would be bad for perf). Either that, or it could be that it’s being done on the GPU, but your CPU is having to stall waiting on the GPU to catch up with that expensive operation. Doing more detailed frame profiling can likely establish which of these cases is happening here. See exactly where within CPU frame submission you’re stalling. Also, using CPU and GPU profiler tools like Very Sleepy, Tracy Profiler, and Nsight Systems can help point this out as well.


** (It is somewhat interesting that 0.1 msec * 256 slices = 25.6 msec, which is pretty close to your 30 msec. But I’m sure that’s complete coincidence. :slight_smile: )

1 Like

I’m on an Nvidia GeForce GTX 1060 6GB. I believe the driver version is 560.35.03. The internal format of the texture is R32F; decreasing to R16F makes is a little less slower, not by much though. On the Java side, there are no translation layers present that I am aware of. I’m using LWJGL and the 3D engine, that’s it.

While fiddling around with the problem, the 30ms drop per frame happened to “go away.” I don’t know what ended up “fixing” it, but I think it’s probable that I mistakenly attributed the 30ms drop with mipmap generation when it was actually caused by something else related. Now, after an initial slowdown from mipmap generation, the program runs fairly smooth. This will certainly require more investigation on my part!

Since you were interested, I did some rudimentary CPU profiling as well using Java’s System.nanoTime(). Here are the results:

CPU average times to compute N mipmaps:
9: 82.7ms
5: 82.3ms
2: 82.0ms
0: 0.3ms

The 3D engine’s profiler reports ~72ms on the GPU. I believe the CPU is slow because it’s blocking compute dispatches until the GPU finishes the mipmapping. So yeah, it’s pretty much linear as you’d described, despite being a 3D texture.

I don’t currently have better profiling methods, but I will certainly look into some of the ones you mentioned, especially since I’m heading into the optimization stage of my project.