Official Vulkan Feedback: API for High-efficiency Graphics and Compute on GPUs

sylware · March 7, 2015, 6:40pm

As I said earlier… to be able to properly perform the DMA commands manually at the time the “3D engine” (the layer above vulkan) wants is performance critical for discret GPUs. Time outs should be set by the layer above vulkan since only that layer (usually the 3D engine) knows the timing constraints.

I wonder how “GPU reset” and “GPU back from suspend” (vram is empty…) will be handled by the API in order to notify the layer above (3D engine) to reload vram.

I have my own AMD radeon SI linux driver, and as far as I could understand Vulkan… the “objects” of the vulkan API are exactly the GPU “hardware” objects (descriptors, command queues, DMA engines…).

One nice thing is spir-v, which sends to hell the c++ fan-boys (I have to confess I was very skeptical about standardizing an IR for GPU).

MissingAFew · March 7, 2015, 7:02pm

I’m not good at waiting though… but I guess I have no choice. Trying to be ahead of the curve is difficult. ;(

Mannerov · March 8, 2015, 4:00am

I’ve seen the presentation videos of Vulkan and the slides.

As far as I understood, we can use several devices, even if they are not from the same vendor.
How is going to work the ‘connection’ between the cards.
I mean: is there going to be a open source part of the Vulkan driver, that is going to connect to all the vulkan providers ?

That would be great and prevent the issues OpenGL has under Linux with that (there has been some open source work by Nvidia to improve the situation, but it is far from complete: https://github.com/NVIDIA/libglvnd)

I think it would be really great if both the Vulkan dispatch and the WSI could be put in a open-source cross vendor library, that would connect to all the specific vulkan drivers. What’s the status on this ?

Sirisian · March 8, 2015, 4:15am

Not sure if it’s been mentioned, but since Vulkan has so many industry partners with many smart minds would it be possible for Vulkan to include a standard windowing library? I’m thinking from the point of standardizing all tutorials and example code. Basically something that everyone would use from the beginner to the experienced to start their project from.

Overv161 · March 8, 2015, 4:34am

Where are the slides for the March 5 GDC talk? The link refers to the general overview and I am quite sure it originally linked to the correct file.

Gedolo2 · March 8, 2015, 4:36am

+1

There was a 60 slide presentation about Vulcan, I can’t find it anymore.

Gedolo2 · March 8, 2015, 5:05am

It would be great if Vulkan could work with the following:

Support for new generation of frame sync technologies
(These newer technologies have some advantages over V-Sync):
http://www.geforce.com/hardware/technology/g-sync
http://www.amd.com/en-us/innovations/software-technologies/technologies-gaming/freesync#about

These technologies can avoid tearing, improve image quality. Have some advantages over V-Sync.
http://www.tomshardware.com/news/amd-project-freesync-vesa-adaptive-sync,27160.html
http://www.tomshardware.com/news/vesa-displayport-freesync-amd,28524.html
The first screens with support for G-Sync and FreeSync are already available and the necessary graphics card drivers to test this feature too.
Thus making it possible to test this functionality.

These frame sync technologies would be very interesting for video players in particular.
Being able to play every movie in it’s native frame-rate avoids motion interpolation artefacts and simpler code.

Support for VR/AR headsets
Support for VR and AR headsets. (Including head movements.)
(VR: Virtual Reality, AR: Augmented Reality)
Many companies are bringing out such headsets. Support for this is basically having functions for stereoscopic rendering.
Having an advanced camera/viewer type able to have descriptions of doing either single or stereoscopic rendering would be very convenient for developers.

Alfonse_Reinheart · March 8, 2015, 8:07am

They’ve been made available again via Google Drive. I have no idea why the Khronos link changed.

Alfonse_Reinheart · March 8, 2015, 8:12am

“Beginners” have more needs from a windowing library than experienced developers making a real project. Beginners don’t care about the nature of the window besides “fullscreen” or “windowed” and basic resizing. Experienced developers do. Beginners will want input and other features; for experienced developers, such a library would be of no value.

The WIS system of Vulkan (discussed in the slides) should cover the needs of experienced developers, across multiple platforms.

Of course, there’s the question of whether Vulkan is even appropriate for a “beginner” to begin with…

greyfox · March 8, 2015, 8:33am

I think this should be left to third-party utilities. The goal of Vulkan is to be a hardware-appropriate abstraction. It should’t, however be an OS windowing system abstraction as well. I think that we saw in the slides (render targets explicitly connected to the OS-specific presentation object) is probably the smartest way to go ahead. I am sure that someone will write a VulkanFW or similar for quick and easy cross-platform solutions

Dark_Photon · March 8, 2015, 12:02pm

Often good info gets buried in videos and not disseminated, so here are a few snippets that caught my ear from:

The new Vulkan and SPIR-V specifications (youtube)

50 minutes in: Vendor “Vulkan performance” results (ARM, Imagination Tech, nVidia)

Vendor drivers not optimized yet, but even so…

[li]ARM - 79% reduction in CPU cycles spent in driver (up to 5X faster, if driver bound?)[/li][li]ImgTech - Significantly reduced CPU overhead: almost an order of magnitude (~8X-9X faster, if driver bound?)[/li][li]nVidia - [/li]Frame Times (on a laptop):
400ms (OpenGL)
65ms (Vulkan) – 6X faster
10ms (GL + NV_command_list) – 40X faster (Vulkan expected to match this)

It’s really encouraging to hear wide cross-vendor endorsement (mobile and desktop) for the performance advantages Vulkan brings.

mtmole · March 9, 2015, 10:18am

I went to the Vulkan info session at GDC going over some of the specifics on the API and brought up a question about GPU capability. Graham said there would be some sort of GPU info or parameters that you can grab when realizing the device, providing something along the lines of whether it is an immediate or tiled renderer. I feel that this is not enough and the driver should have the ability to provide more detailed preferences or metrics about preferred state change order and the stall potential of different actions. This would put more power at the hands of both the driver and software dev to improve perf.

If the idea is to write the most streamlined API, this has to be a part of it. We’ve had to follow too many developer guidelines from different manufacturers about best practices for their hardware.

For instance, I tend to write code on mobile to take most advantage of different GPU’s based on empirical evidence. Something like this usually ensues

if (powerVR) {
SortDrawsByMaterial();
} else if (adreno || mali) {
SortDrawsByVertexBuffer();
} else if (tegra) {
SortDrawsFrontToBack();
}

This coming from more obvious things such as many PVR chips (but not all!) having a higher cost switching shaders, many Adreno chips (but, again, not all!) having a smaller vertex cache than usual, and Tegra typically needing as much Z-fail as it can get being an immediate renderer. This “guess and check” approach leaves edge case hardware trailing, blocks out performance for new players, and makes for messy and not forward compatible code.

Additionally texture cache sizes and sampling rates enforce different decisions about render target size, atlasing, and filtering. How am I supposed to know, other than through bad perf testing procedures, that aniso is so much more expensive on specific hardware? Let it provide that information to me!

Thoughts?

Alfonse_Reinheart · March 9, 2015, 11:56am

Are you honestly asking for detailed specifications, including cache sizes and memory bandwidth, for every single GPU operation? How would a card describe bad anisotropic performance? What exactly is it about that card that makes the anisotropic performance “bad”, relative to some standard?

And what happens if a certain specification can’t be guaranteed? At best, bandwidth is a maximum value. It can be influenced by innumerable factors: how much contention there is for memory through that channel at that time, etc. It would be very easy to think that a high sampling bandwidth number automatically means “lots of big, simultaneous textures,” only to find out that it doesn’t mean that for other reasons.

In any case, predicting application behavior from raw specifications is always highly dubious. Many a developer has thought, based on a spec, that various operations would be too slow to use, or that other operations would be fast enough to hammer hard. And when they got the actual hardware, they would often find their assumptions to be completely wrong.

It’s one thing when you’re talking about elements that represent the basic nature of a renderer. TBRs are fundamentally different from the standard model. But the more information you provide, ironically, the less you really know about the hardware.

And even that all assumes implementations won’t be willing to lie to you.

Why would an implementation lie? Well, nobody wants it widely known by verifiable information that their texture memory controller has 20% less bandwidth than their competitors. Implementations will therefore have a good reason to inflate their numbers. Would anyone be able to tell the difference without doing “bad perf tests”?

Also, an implementation could lie due to stupid developers. For example, it’s entirely possible that an implementation might lie in order to force a popular application to use a certain codepath, because that codepath actually is faster on that hardware than the developer thought from their spec analysis. Developers aren’t perfect, and sometimes they’ll do the wrong thing.

You cannot effectively make accurate, a priori decisions based on information of dubious accuracy. Which means you’re still going to have to go and actually check to see if a certain set of rendering operations really is faster.

And if you don’t think IHVs will lie for either of these reasons, you’re way too trusting.

Furthermore, if forward compatibility is your concern, then detailed specs aren’t helpful to you. Consider a world where TBRs never existed. Then suddenly, someone comes out with one. Well, Vulkan’s API would have no way to tell you that it’s a TBR, and therefore you will assume there’s a problem because you see terrible write bandwidth. But TBRs don’t need huge write bandwidth, by their very nature. To fully understand the value, you would need to interpret the specification differently. But there’s no way to codify the notion of TBR in that API; you’d need some kind of extension, and you would need to radically change every application that uses this spec data.

At least by doing it Vulkan’s way, they have a single, extensible value that represents a particular kind of renderer. If a new one shows up, then it uses a new value, and developers will use a fall-back case until they learn how to do the right thing.

Remember: premature optimization is the root of all evil. And the only possible use for the kind of information you’re talking about is premature optimization. So I would say that the best thing you can do is continue to write code based on empirical evidence.

Sirisian · March 9, 2015, 3:18pm

Well having a windowing library with a modern input abstraction would be sufficient for both groups I’d imagine. Industry partners would be a good spot for defining such an abstraction as most engines implement rudimentary key mapping supporting a range of inputs, including raw inputs. They’ve pretty much all come to the same abstractions already so it’s not like there’s any hurdles.

I’d say beginners nowadays would definitely want borderless fullscreen and all the options an experienced developer would be implementing for customers. Some professional games still lack that which goes to show the difficulty even in the industry of doing proper windowing options. An abstraction created by industry professionals for a modern GPU API would really accelerate Vulkan’s adoption and dissemination of proper methods without getting caught up on trivialities.

I think the idea of separating beginners and experienced developers is also probably a flawed distinction. Really Vulkan shouldn’t be seen as beginner averse API as many beginners will be learning with it when it’s released. The goal really needs to be creating very clear tutorials. Ideally I should be able to write an example as a single commented file and paste that for someone to follow. Compiling and running it should start the program with no third party dependencies on every compiler and platform. Having that approachability I believe will be key to making Vulkan successful. It’s something they can do also, and I believe they should.

Alfonse_Reinheart · March 9, 2015, 4:35pm

Well having a windowing library with a modern input abstraction would be sufficient for both groups I’d imagine. Industry partners would be a good spot for defining such an abstraction as most engines implement rudimentary key mapping supporting a range of inputs, including raw inputs. They’ve pretty much all come to the same abstractions already so it’s not like there’s any hurdles.

So you what Khronos to spend precious time and money developing… an input library. One that will benefit precisely zero actual game engines or developers (since they already have them).

Every programmer-hour and every dollar they spend on this input library is time and money not being spent on tools for serious developers: debuggers, profilers, conformance tests, etc. They are the ones who will decide whether Vulkan succeeds or fails. So I’d rather they focus on them.

Vulkan is not for everyone, and it’s not supposed to be. OpenGL will continue to be updated with features, so there’s no need for beginners to start coding graphics with Vulkan.

Really Vulkan shouldn’t be seen as beginner averse API as many beginners will be learning with it when it’s released.

But it is a beginner-averse API, and no amount of niceties in the loader will change that fact. No person for whom the label “beginner” would apply is ready to handle Vulkan. Without implicit synchronization and other elements of APIs like OpenGL, there are just too many very complicated things for a beginner to learn.

The best you could do with a beginner is to teach them cargo-cult style: just do things in this order, and it will work. Cargo cult programming may work to get something on the screen, but it is very detrimental in the long run. It gives the impression that the beginner actually understands what they’re doing, rather than them simply copying bits of code around with no real understanding of what they’re doing.

Ideally I should be able to write an example as a single commented file and paste that for someone to follow.

Encouraging copy-and-paste coding is helpful to no one, at any skill level.

ratchet_freak · March 9, 2015, 5:15pm

I see vulkan becoming the “elite” graphics api that is the next step for openGL programmers

unless there is a beginner friendly vulkan based library that will reduce the hello world triangle to a few dozen lines of code rather than the 600 spoken of in the livestream. While still allowing enough low level access to let advancing devs tinker with the specifics all the way down to as if the library wasn’t even there (pipe dream I know)

Sirisian · March 9, 2015, 6:33pm

[QUOTE=Alfonse Reinheart;31087]
The best you could do with a beginner is to teach them cargo-cult style: just do things in this order, and it will work. Cargo cult programming may work to get something on the screen, but it is very detrimental in the long run. It gives the impression that the beginner actually understands what they’re doing, rather than them simply copying bits of code around with no real understanding of what they’re doing.[/QUOTE]
I think this point was mentioned in the presentation, but I think that’s precisely why beginners will jump toward Vulkan. The current OpenGL atmosphere has way too many unknowns and beginners will be looking for something that offers them a better understanding of what the GPU is doing.

Well that’s obviously not the goal, but showing working samples is sometimes nice to explain each step and the communication to the GPU.

The 600 lines mentioned doesn’t really speak much for the complexity of the code. It could turn out to be fairly straightforward to follow. The extra lines could be more expanded steps that lets a beginner better understand what’s happening, rather than staring at the abstraction and finding the steps in the documentation.

Alfonse_Reinheart · March 9, 2015, 9:39pm

Perhaps this would best be discussed on a different thread, to keep this one open for commentary on Vulkan itself.

MissingAFew · March 9, 2015, 9:49pm

Isn’t everyone a beginner when it comes to Vulkan? I consider myself a beginner and I think I work well when Im forced to learn lower level concepts and read through documentation. I don’t think people should just run away from it because the “experts” say to not bother, how else are people supposed to learn?

I see no reason to discourage people, unless you are trying to prevent competition.

ratchet_freak · March 9, 2015, 11:57pm

[QUOTE=MissingAFew;31093]Isn’t everyone a beginner when it comes to Vulkan? I consider myself a beginner and I think I work well when Im forced to learn lower level concepts and read through documentation. I don’t think people should just run away from it because the “experts” say to not bother, how else are people supposed to learn?

I see no reason to discourage people, unless you are trying to prevent competition. ;)[/QUOTE]

multithreading and its pitfalls is not for beginners, Vulkan apps can still be single threaded but buffer interraction must be synchronized with the GPU explicitly.

telling that to beginners that the upload happens asynchronously and the pointer must stay alive until the fence has been triggered will lead to confusion unless they have previous experience with async IO.