Deterministic Frame Rate Capabilities?

Hi, I am new to Vulkan but a long time user and proponent of OpenGL. One of the biggest shortcomings of any 3D graphics API (to my knowledge) has been the ability for a software developer to maintain deterministic frame rates when using OpenGL even with fastest, modern day GPUs and CPUs. That is, the capability of the user’s application to maintain a target update rate such as 60 frames per second. If an “overload” occurs today within OpenGL or Direct3D, the user’s application update rate simply drops to the next subinterval of the refresh rate when sync’d to the vertical retrace rate (which it normally is for training and simulation applications).

In my industry (training and simulation), this has always been one of the major shortcomings for modern day GPUs using OpenGL; this feature is something the very old proprietary and expensive image generators from decades ago were able to achieve at least with some rudimentary user specified priority scheme for rendering. Today, as we know, once the OpenGL pipeline is filled by the user’s application program with various rendering tokens, it cannot be stopped / interrupted until it completes. Even if the application only goes a few microseconds over the threshold, the update rate automatically plummets to the next lower subinterval of the vertical refresh rate (i.e., 30 Hz when running on a 60 Hz refresh rate display or projector). This is somewhat catastrophic to the suspension of disbelief in the simulator or in the game, especially when the rendering is on the verge of 60 Hz and possibly going in and out of 60 Hz and 30 Hz continuously. Application developers spend a bunch of their time making sure they minimize this artifact, but a deterministic frame rate is not something anyone can guarantee using OpenGL.

Maybe Vulkan can address this (perhaps as an optional extension of some sort)? IMO, deterministic frame rates would be very, very useful to the simulator world as well as game developers if this feature were available.

Thoughts?

How could you possibly achieve “deterministic frame rates”?

I suppose one strategy could be some kind of prioritization of rendering operations. However, that’s not very likely to be viable. Unless you render stuff in priority order, there would be no way for the GPU to realize that it’s gone over budget and not render the lower priority things.

And if you draw in priority order, you create some limitations on your rendering processes. It can make rendering less efficient. What happens if you do a z-prepass but then the actual rendering of one of the occluders is considered low priority and cut out? How exactly would deferred rendering work under such a scenario, when the final steps are more or less fixed in cost and absolutely have to be done?

I don’t know that a hardware solution like that is viable. Without it, determinism can only be achieved by knowing, a priori, exactly how long a particular sequence of rendering commands will take to complete. And that’s effectively impossible.

After all, GPUs are shared devices, like CPUs. You don’t own the GPU; you share it with the OS and potentially every other application currently running. Maybe the OS decided to do some GPU cleanup and stole a millisecond of the GPUs time from you. There’s nothing you can do about that, and Vulkan will most assuredly not prevent the OS from doing that.

Not to mention, you’d have to know how long commands take for every GPU out there that you support. They’re going to have different performance characteristics, so you have to do a lot of benchmarking to know which operations will be expensive and which will not.

That’s not to say that Vulkan cannot be more deterministic than OpenGL. One of the issues with GL is that a lot of performance hurting operations are implicit. If you suddenly render with a texture you haven’t used in 400 frames, the GPU may have paged it out. So it may upload it back to the GPU. You don’t know if it did, so a heavyweight operation appears lightweight.

In Vulkan, if you suddenly render with a texture you haven’t used in 400 frames… you know you’re doing that. You had to tell Vulkan specifically to make that texture available for use. When you stopped using it 400 frames ago, you had to tell Vulkan that you’re not using it anymore. And so forth.

You have explicit control over these things. So when you do a painful operation, you know you’ve done it. Furthermore, you have direct control over things like what video memory is committed. So you can’t encounter scenarios where you add one more texture to the rendering and suddenly you start swapping every frame.

Then there’s all of the CPU overhead that OpenGL has which Vulkan doesn’t. Added to that is the ability to thread rendering process on the CPU, ultimately removing the CPU as a limitation in GPU performance (and thus as a source of non-determinism).

Vulkan won’t make rendering deterministic. It won’t be able to tell you exactly how long a particular command buffer will take to execute on the GPU. But reasonable use of the API should smooth out the spikes relative to OpenGL, as well as make it apparent where most of them come from.

At the very least, you’ll be better able to know when you’ve done something expensive that could cause you to drop frames.

I acknowledge that with a general purpose GPU, getting deterministic frame rates may be unachievable in the near term. I thought I would mention this now only because Vulkan is in its infancy, and maybe someone smarter than me may think of a way someday or design in “hooks” to help alleviate this problem in the future. IMO, achieving stable frame rates is one of the biggest headaches in the industry (games, simulators, …); making the scene beautiful is one thing (and the industry has made great strides here); but having a way to make one’s application update rate more deterministic (without waiting for the application to overload) hasn’t ever been done before in the general purpose GPU market to my knowledge. Having some priority based rendering within a timer based pipeline (via API extensions perhaps) may be one way; maybe there are other ways that hardware and software designers coming up with the next generations of hardware and software can help minimize this problem.

Actually, Mantle has both timestamp and “If” commands, so it is MAYBE possible there. But it would require relying on undefined behavior (timestamp must be an UNIX style value). And, timestamps themselves shouldn’t be a perfomance wreck for this to even make sense. And all of this have to make it into Vulkan, of course. Though, aren’t simulators usually developed for a fixed set of hardware? It appears to me, it’s just simpler (and cheaper) to dynamically reduce graphics quality on complex scenes based on thorough testing rather than buying some specialized magical GPUs.

Simulators based on commercial off-the-shelf technology (like the ones I typically work on), use the fastest GPUs with professional features, especially large GPU memory. For example, the next simulator I’ll be working on uses the NVIDIA M6000 GPU. The last simulator (still doing maintenance there) uses the NVIDIA K6000. When those go out of production, we simply move up to the next best thing (technology insertion). Tuning to 60 Hz takes a very large percentage of the development cycle especially when testing a new scene with hundreds of different moving models for example. Gets even more complex when we start paging high resolution imagery and terrain.

Maybe the best thing to consider are available pipeline debug modes to help find hotspots which can be turned off and on at will during testing phases and to not cripple performance for the non-debug optimized case.

You are all right of course in that tuning is always required; I hope that the nextGen developers can support ways starting with the initial releases to help provide developers with some tools to better achieve / estimate deterministic performance.

Then I’d say your primary non-deterministic problems will likely be significantly reduced with Vulkan. It’s low-level memory architecture makes streaming-type processes much smoother, with a lot less driver variation/interference. Of course, you have to accept the burden of greater responsibility, but I’m sure you can handle that.