A review of a Sprite drawing strategy

My results mirror yours. My implementation supports both as well. To your question above though…:

History note: As I recall, the MAP_PERSISTENT buffer upload support wasn’t developed to fix a speed problem with MAP_UNSYNCHRONIZED. It was developed because NVIDIA found through profiling that the MAP_UNSYNCHRONIZED technique thwarted full parallelism in their driver with the multithreaded driver option enabled (Threaded Optimization = ON).

As you can see here, they really dumped on the MAP_UNSYNCHRONIZED technique, as they wanted you to flip to MAP_PERSISTENT:

Hey if it was faster, I’d have been sold! However, perf was basically the same in my experience (and I’ve re-verified that since).

MAP_UNSYNCHRONIZED is easier and conceptually simpler, as the app doesn’t have to explicitly over-allocate buffer space, fence, and wait for those fences if the GPU gets too far behind. However, it does require more driver mojo under-the-hood to multi-buffer VBO re-allocations (orphans) of the same size for fast “swap chain” like behavior. MAP_PERSISTENT OTOH is more like what you’d do for Vulkan. So if current or future Vulkan cross-compatibility support is required, then it’s the better choice.

Now, I personally always turn Threaded Optimization = OFF because doing so leads to more consistent frame times (with or without MAP_UNSYNCHRONIZED use). And in my world, it’s all about hitting 60Hz, 90Hz, or 120Hz consistently every single frame, with minimal latency and multisampling. None of this “30Hz mostly with blur/TAA” stuff. Here’s a very recent case where perf issues with Threaded Optimization = ON came up: