Perf problems subloading to large texture array

Dark_Photon · September 29, 2016, 2:08pm

I’m trying to get to the bottom of a major slowdown subloading slices into a 2D texture array (on NVidia drivers). Subload calls go from taking 10ths or 100ths of a millisecond to taking 10s or 100s of milliseconds, each! (1000x slower)

Given the number of subloads, the app is basically hung for 20 seconds inside of glCompressedTexSubImage3D calls during this period, but the effective texel upload rate is pitiful (~2MB/sec).

Moreover this slowdown occurs when subloading into the same MIP levels of the same slices of the same 2D texture array. The first few times it’s fast. And then for a period of ~20 seconds, all of the subloads are massively slow.

Does this remind anyone of similar problems they’ve encountered in the past? Any tips to share to streamline texture array subloading?

Something the app is doing is clearly triggering a slow path in the NVidia driver, but I’m still trying to figure out what that is.

Thanks.

Dark_Photon · October 21, 2016, 5:56pm

Driver voodoo magic like this can drive you nuts when it’s not working for you.

In case someone else hits this, I found two workarounds for this NVidia driver quirk. Before I mention them, I should say that the preconditions for having this problem are:

[ol]
[li] NVidia GL drivers [/li][li]texture arrays which contain “a lot” of slices (hundreds or thousands), [/li][li] texture array MIPs are allocated but not initialized on startup (e.g. via gl*TexStorage3D or gl*TexImage3D with NULL ptr) [/li][li] texture array slice MIPs are subloaded individually later (via gl*TexSubImage3D) [/li][/ol]
In this case, subloading individual slices into the texture array(s) can get “insanely” slow! Why? No clue, but if an NVidia driver dev is reading, I’d love to know! From all my testing, it’s like the NVidia driver chooses a very inefficient default method for subloading slices for this use case. I say “default method” because it is apparently possible to influence it to chose another method.

With that preamble, here are two different ways of talking to the NVidia GL driver that (used separately) appear to clear up the driver logjam when subloading slices into the texture arrays:

[ol]
[li] Always allocate-and-initialize texture array MIPs up-front (i.e. gl*TexImage3D with non-NULL ptr). If that means you allocate a half-GB block of mem to feed in complete garbage, just do it. It can speed-up future subloads by 1000X. Or be creative and feed in an existing readable mmaped pointer to some junk block that’s long enough to satisfy the subload. [/li][li] Stream the texture array subloads in via PBO filled with an efficient Buffer Object Streaming method. [/li][/ol]
Either seems to do the trick. …but what’s really going on down there, I have no idea.

For reference, here are some possibly related past posts:

[ul]
[li] Possible NVidia Driver Bug ~ 319.49 [/li][li] Performance problem PBO + glCompressedTexSubImage3D + Texture Array [/li][/ul]