Vulkan: for beginners?

I still think it’s a good idea primarily because of the position Khronos is in at the moment. They have tons of industry partners that have experience with such things. I believe such an effort would be a huge bonus for the community. It can be argued that it would be pointless since every industry professional has written their own, but that’s precisely the point of getting that kind of thing out in the open. It can also be argued that some random person or multiple people will write one a few weeks or months after release. My fear is this will only slow the learning process by creating multiple ways to write tutorials and introductions. I know it might seem like a waste of time (assuming the same people as the API work on it), but it’s early enough before release that I think it would be a feasible, albeit contentious, endeavor to include a windowing library at Vulkan’s launch.

That’s actually exactly what C++ exists for, to stop you from shooting yourself in the foot. C doesn’t stop you from shooting yourself in the foot. This whole Python Vs. C++ comparison with OpenGL Vs. Vulkan, i get what you are trying to say but not a very good comparison at that.

They don’t need the performance so why would it matter? When they start advancing their engine which will require more optimization then it’ll matter. Sure they should know what they are doing is wrong but that’s not on us to decide that for someone else. I’ve come across two kinds of people in my life. People who are willing to change for adopting new/better methods and people who will use the first method they learnt til they die because that’s the method they know, even if they are presented with a better solution they simply won’t use it. I use OpenGL badly, i know it’s bad, i probably lose a lot of performance by doing what i do with it but i don’t need that extra performance. I still get 300+ fps with my terrible implementation. I use switch statements and for loops in one of my shaders. It simply is easier to use, if i ever need that performance i can go back and maybe spend the hours it’ll take to iron out a better solution. For now it is good enough. Using macros with the preprocessor and compiling a bunch of different shaders for equivalent functionality would require well over 2000+ shaders. I know on my laptop the compilation of the shaders already takes over 2 minutes without even going over 100 shaders.

“Don’t tell me what i can’t do”, basically. OpenGL is probably one of the most painful APIs i’ve ever had to use, just the design principles behind it. I am going to ditch it as fast as i can and i sure as hell am not going to directx, which is the only other alternative (for PC). I’d say beginners should learn Vulkan first, if it is too much for them or they are not willing to learn concepts along the way then that’s just their preference and how they choose to learn. So they might go to an alternative.

exactly performance improvements is for part 2 of the tutorial,

part 1 should be getting started and general concepts explaining what each thing does one step at a time.

lesson one would be initializing the basics, single command queue, no threading necessary. Just as simple as possible to get that triangle on screen (I doubt that would actually require 600 lines but we’ll see)

lesson two would be showing the triangle with the basic “no-op” shaders, single queue and creating the command buffer. Discussing the DMA and it’s async nature,

then start with the normal geometry and lighting stuff and advancing the shaders.

part 2 would then introduce tesselation and geometry shaders, threading, multiple queues, synchronizing them, compute shaders etc.

[QUOTE=ratchet freak;31129]You will still need to discuss memory transfer and barriers but that can be contained to just the asynchronous IO concepts. If they understand that while IO is busy (between the writeAsync call and the checkfinished returns true) changing the contents of the buffer passed in writeAsync will cause undefined results, then they will be able to understand that those are the barriers of the async-API.

Then you will be able to explain that the entire command buffer will be executed asynchronously and the barriers will prevent advancing of the queue until user code calls vkBarrier(barrierObject); to ensure the buffer is fully populated by usercode and that the fence after the DMA will let user code test if the result is done and then use the buffer for something else. (I am making assumptions about what barriers and fences are here)[/quote]

Just to clarify: when I was referring to multithreading, I was specifically talking about the GPU/CPU asynchronous processing. I should have been more specific on that.

OK, so you explain that the command buffer executes asynchronously. So their first question will be “what does ‘asynchronously’ mean?”

You now must stop and talk about what it means to have multiple lanes of execution happening simultaneously. Then you have to talk about contention for resources and the problems therein.

After that, you can talk about your fence call. But you can’t talk about your vkBarrier call without first introducing the problem of visibility (assuming vkBarrier is about memory visibility). After all, you said that the fence would be enough to that the DMA would finish before the rendering stuff. So if the DMA has finished, why do they need that vkBarrier call?

Because it will ensure that there is visibility between the writing process and the reading process. And to get them to understand that, they need to understand about memory caches, forcing out cache lines, etc. And that the memory barrier will ensure that any appropriate GPU caches are cleared.

OK, to be completely fair… you could probably avoid the last one. But you would be replacing it with, “that’s just what Vulkan requires”. Which is certainly functional, but hardly illuminating to the programmer.

[QUOTE=ratchet freak;31134]part 1 should be getting started and general concepts explaining what each thing does one step at a time.

lesson one would be initializing the basics, single command queue, no threading necessary. Just as simple as possible to get that triangle on screen (I doubt that would actually require 600 lines but we’ll see)

lesson two would be showing the triangle with the basic “no-op” shaders, single queue and creating the command buffer. Discussing the DMA and it’s async nature[/quote]

Command buffers are not optional. You can’t submit commands of any kind without a command buffer. So your lesson one has to include them.

Memory access and modification will either be via DMA or via mapping (likely persistent ala ARB_buffer_storage). But either way, you have to introduce one of them in lesson one. Which means you must introduce async, or just say, “just do this for now.”

So that’s two big topics that your lesson one needs to include.

[QUOTE=cheery;31130]I asked two questions:

  • Why to prevent clueless Vulkan developers from forming?
  • How does dropping beginner documentation will prevent clueless developers?[/quote]
  1. Because clueless programers who remain so for long enough will become disillusioned programmers, rather than knowledgable programmers. There have been many people who thought they could learn OpenGL from some crappy tutorial and online docs, failed miserably, and then stopped trying to learn graphics programming altogether. It’d be a far more productive use of everyones time if these persons were warded off before they started.

  2. Your question makes no sense. You can’t drop something that doesn’t exist yet. Nobody has discussed “dropping beginner documentation”; what we’re talking about is whether Khronos should write “beginner documentation” (when I would consider to mean in-depth, beginner friendly tutorials or books that assume no prior graphics knowledge).

And I remind you: “beginner documentation” and general documentation are not the same thing. General documentation will almost certainly exist, primarily as reference docs. Beginner documentation would be things like tutorials for people who don’t already understand graphics programming. Vulkan tutorials for people who already understand graphics programming is not “beginner documentation.” At least, not in the context of this conversation.

My point is that Vulkan requires significantly more, a 2x increase if not 3x. I even gave an example with Hello Triangle. OpenGL required 4 concepts, while Vulkan required those 4 plus 5 of its own.

That’s a fair point. Or rather, it would be… if the book in question weren’t called “The OpenGL Superbible”. A reader would expect to learn OpenGL from such a book, not gltools.

If it were called “The Graphics Programming Superbible”, then the API it uses would be irrelevant.

What I said was, “Vulkan’s additional concepts are a hindrance to that goal [of learning graphics]; they’re needless noise, arbitrary hoops you have to jump through to make your code work.” I don’t know how you go from that to “I think parts of Vulkan will be ‘needless’.”

What makes things useful depends on the goal and the circumstances. A mountain-bike is more useful than a car, if you happen to be in the middle of a forest with no roads around.

Vulkan has a lot of powerful features that are very useful for a skilled programmer who is interested in attaining the highest performance possible. For a newbie, these same powerful features are a hindrance for learning how graphics works. Just like with C++, naked pointers are a powerful tool in the hands of a knowledgeable programmer. In the hands of a novice, they’re a confusing and highly dangerous construct (which is why safer languages like Python, Java, etc don’t give them to you).

I think you just need to understand that not all motivations for people looking up to learn Vulkan are equally reasonable.

Some people who are investigating Vulkan have an objective need for Vulkan. Maybe they have applications which have performance needs, those needs aren’t being met, and Vulkan will meet them. Maybe they have console applications who’s back-end renderer looks similar enough to Vulkan that porting to that is easier than something like OpenGL. Things like this represent considered, learned, and informed opinions based on objective facts.

Other people have a purely personal interest. Hobby graphics programmers are always interested in the next new thing. Maybe they’ve investigated these low-level APIs, and they want to learn the concepts behind them. This is not an objective need; it’s a personal choice. But even here, it is based on objective facts about Vulkan that the person has learned.

Then, there’s what we’ve been calling “beginning graphics programmers.” These people are not looking at Vulkan based on objective facts. They’re looking at it because it’s new. They see Vulkan as a replacement for OpenGL. They hear about the performance benefits and think that makes Vulkan objectively the best possible graphics API for them.

These are objectively incorrect. Vulkan may be new, but “new” doesn’t make it good for them. Khronos has clearly stated that OpenGL and ES will be updated alongside Vulkan; Vulkan is an addition, not a replacement. And while Vulkan will perform faster, that performance comes at a substantial user burden, and none of these “beginning graphics programmers” will stress the GPU enough to actually benefit from it. At least, not until they’ve stopped being beginners.

What you have is a group of people making decisions for the wrong reasons.

Khronos should make every effort to encourage groups 1 and 2. Thus, they should have appropriate reference documentation online. They should make certain that tools are available which will be appreciated by these people. And so forth.

Khronos should make no effort to encourage group #3, and where possible, those people should be funneled to something more appropriate for them. Yes, that won’t prevent some of them from doing it anyway. Nor will it prevent some of the “kinder” members of group 2 from creating materials for them. But it will at least keep some of them away, possibly sent to things that will be better for them in the long run.

You can’t stamp out idiocy, but you can at least put up a warning sign.

Outside of incoherent memory accesses in OpenGL, an OpenGL-based graphics programmer doesn’t have to know anything about how asynchronous the GPU is. Nor do you have to know about command buffers. So no, not all Vulkan concepts are universal for graphics.

How exactly does C++ stop you from shooting yourself in the foot? For example:


std::string &BadFunc()
{
  std::string foo("foo");
  return foo;
}

The C++ standard is very clear: you are referencing an object who’s lifetime has ended. That yields undefine behavior. But the standard lets you do it. Oh sure, compilers like Clang will probably detect it and issue a harsh warning. But the standard does not require a compiler to detect it and fail at compilation.

And that’s just one simple example where C++ offers no protection.

So I see very little evidence for C++ stopping you from shooting yourself in the foot. Oh yes, C++ has many features which, when used correctly and consistently, will prevent all kinds of problems. C++ can be used safely, but it requires positive effort from the user to keep everything correct.

By contrast, “safe” languages like Python requires positive effort form the user to fail.

This is exactly like how OpenGL protects you from GPU synchronization, while Vulkan requires you to follow the rules. One requires positive effort for you to fail (or makes it flat-out impossible); the other gives you the tools needed to succeed, but won’t stop you from doing it wrong.

Will it? Not everyone needs the maximum performance possible (if that were true, languages like Java, C#, and Python would not exist). Sometimes, “good enough” is good enough.

Not all the time, of course. But beginning graphics developers who graduate from “beginner” generally are not working on something that needs that level of performance. Particularly since Vulkan is primarily about increasing CPU throughput (though GPU throughput goes up since the CPU can sending more stuff).

You seem to be undermining your own point. Not everyone needs Vulkan performance improvements, as evidenced by yourself. You clearly do not need Vulkan’s performance (though you’d probably welcome it if GL 4.6 accepted SPIR-V, so you could get some off-line partial compilation benefits), as your application, despite your admitted inefficiencies, is fast enough for your needs.

So what would you as a beginner have gained from learning Vulkan, had it been available? An API that you felt was cleaner by some definition (not that OpenGL could fit into any definition of “clean”)?

I think you’re misunderstanding the conversation here. At least on my end.

While I firmly believe that beginners shouldn’t start with Vulkan, I don’t personally care one way or the other. What matters to me is how much effort Khronos should be expending to encourage beginners to use Vulkan.

And my position is, because I believe beginners shouldn’t be starting with Vulkan, Khronos should expend zero effort to encourage or support them. So no Khronos-sponsored “beginning graphics programmer”-style tutorials (tutorials or concept documentation aimed at people who already know graphics are fine), no Khronos-written GLFW-for-Vulkan API, nada. The Vulkan SDK should be aimed squarely at the people who will be most successful with the API, and the people who will be the most responsible for the API’s success.

If beginners can use these tools and manage to get past Vulkan’s barrier to entry, so be it. But Khronos shouldn’t be coddling them or appealing to them.

And yes, I think they should encourage the use of OpenGL specifically for beginners.

[QUOTE=Alfonse Reinheart;31135]Just to clarify: when I was referring to multithreading, I was specifically talking about the GPU/CPU asynchronous processing. I should have been more specific on that.

OK, so you explain that the command buffer executes asynchronously. So their first question will be “what does ‘asynchronously’ mean?”

You now must stop and talk about what it means to have multiple lanes of execution happening simultaneously. Then you have to talk about contention for resources and the problems therein.

After that, you can talk about your fence call. But you can’t talk about your vkBarrier call without first introducing the problem of visibility (assuming vkBarrier is about memory visibility). After all, you said that the fence would be enough to that the DMA would finish before the rendering stuff. So if the DMA has finished, why do they need that vkBarrier call?

Because it will ensure that there is visibility between the writing process and the reading process. And to get them to understand that, they need to understand about memory caches, forcing out cache lines, etc. And that the memory barrier will ensure that any appropriate GPU caches are cleared.

OK, to be completely fair… you could probably avoid the last one. But you would be replacing it with, “that’s just what Vulkan requires”. Which is certainly functional, but hardly illuminating to the programmer.
[/QUOTE]

or you can let beginners not worry about synchronization and let them wait on the results right after starting each command (the await-async anti-pattern), that would be simpler to swallow and easier to expand upon.

[QUOTE=Alfonse Reinheart;31135]
Command buffers are not optional. You can’t submit commands of any kind without a command buffer. So your lesson one has to include them.

Memory access and modification will either be via DMA or via mapping (likely persistent ala ARB_buffer_storage). But either way, you have to introduce one of them in lesson one. Which means you must introduce async, or just say, “just do this for now.”

So that’s two big topics that your lesson one needs to include.[/QUOTE]

You can just limit lesson 1 to the cross-project setup resulting in just showing a window and allocating a single general queue; nothing would be submitted to the gpu at this point.

and then the specifics for the triangle (data upload, shader upload, building command buffer (once for upload, once for drawing), submitting the commands to queue and waiting on the each of the commands to finish) in lesson 2

Probably the most ludicruous comment so far. Aside from that C++ basically allows you to shoot yourself in the foot as much as C does, it also allows you to shoot yourself in the foot with its OO features, such as problems introduced by inheritance (e.g. diamond nheritance) that are not present in Java and C# as much, due to certain limitations (which can also be bad) and restriction to interfaces in multiple inheritance.

Also some people wrote strange reasons as to why Vulkan would be easier than OpenGL: such as “being closer to hardware” being more “intuitive”. I believe actually the opposite is the case. If you want to render something you do not really care about the hardware as much as you do about the software, because the software is what, in the end, defines how the result looks on your screen. Higher levels of abstraction allow to simplify the process of achieving results. And results is what most people want.

In the end it all boils down to how the API will exactly look like and how a simple “draw a triangle with 3 differently coloured vertices, in screen space” application will look like for Vulkan. And also how complicated it would be to extend this for drawing lots of different objects in different ways. I am looking forward to seeing this and I believe we can only then really be sure if this is something only usable by AAA teams and Engine developers with dedicated graphics programmers, of it can as well easily be used by small indie devs.

I completely understand Alfonse’s concerns regarding an influx of beginner’s and the resulting irritations on their side when encountering the difficulties of multi-thread programming etc. … Also a wiki is a must. The OpenGL wiki helps a lot already, and so does the forums, including Alfonse’s continuous support therein.

1 Like

What do you tell the user about that those vkCmdSetFence, vkCmdWaitFence and vkCmdPipelineBarrier calls? That it’s just a dance you have to do every time you issue command buffers? That you’ll explain it in 30-40 chapters, after you’ve covered all of the topics about actual graphics?

So… your first “lesson” is quite literally code that doesn’t do anything. It doesn’t even clear the screen, since that (assuming Vulkan works like Mantle) is a command given to a command buffer. So lesson one is to display garbage.

Do you expect beginners, flush with eagerness to do some graphics, to want to read the second “lesson” before they actually achieve something? That’s the big problem: in order to teach directly from Vulkan, you have to start out so slowly that most beginners are just going to leave and go elsewhere.

A slow start might work in the context of a book, where most readers will have already purchased it and thus are going to be willing to give you the benefit of the doubt. But in an online situation, where leaving is something that happens at the click of a button? You can’t take your audience for granted.

It’s hard enough to give readers a giant wall of text before you get to code. But to start with a “lesson” that teaches you a lot of minutiae that’s 100% setup work needed just to get to the actual graphics?

So your second lesson has to introduce buffers, memory objects, memory transfer, shader compilation, command buffer commands, command buffer execution, render passes, pipeline state, vertex array state, and the data flow through the rendering pipeline. Sure, it’s not as much if you had to also introduce Vulkan initialization, GPU devices and GPU queues. But it’s still a whole lot for a single “lesson”.

And the more you have before you start accomplishing something of substance, the harder it will be to keep your audience.

To add to that, let’s talk about “Hello, World” as a program.

This is the default standard starting program, for many reasons. But the two most important are:

  1. It accomplishes something. It causes something to happen.

  2. What is accomplished can be customized and modified by the user. This is, to me, the most important element of Hello, World.

The user gets to change the string. They can have it display any message they want. They can print multiple lines. They get to play around with it and see what the boundaries of their current understanding are.

Your proposed lesson one doesn’t achieve either of these. OK technically, it creates a window of a certain size. But it doesn’t even put anything in it. It can’t even draw a color into it; it displays either garbage or (thanks to robustness) black.

And it offers no chance for a user to play around with it, save changing the size. They can’t poke at data to see what happens. They can’t modify a shader or change viewport state or anything of that nature.

From the perspective of a user, it’s all sophistry.

[QUOTE=Alfonse Reinheart;31147]What do you tell the user about that those vkCmdSetFence, vkCmdWaitFence and vkCmdPipelineBarrier calls? That it’s just a dance you have to do every time you issue command buffers? That you’ll explain it in 30-40 chapters, after you’ve covered all of the topics about actual graphics?
[/QUOTE]

there would only be a vkCmdSetFence and vkFenceWait call, with the fence wait right after submitting (or if it is available just a vkWaitCmdComplete)

The first lesson could be optional, it will be setting up a single copy of most vulkan render-critical structures each can then be accompanied with a single line explanation. You can get into detail when you go and setup the second copy down the road.

The only exception would be making the actual window using the windowing API and passing the window ID

[QUOTE=Alfonse Reinheart;31147]
So your second lesson has to introduce buffers, memory objects, memory transfer, shader compilation, command buffer commands, command buffer execution, render passes, pipeline state, vertex array state, and the data flow through the rendering pipeline. Sure, it’s not as much if you had to also introduce Vulkan initialization, GPU devices and GPU queues. But it’s still a whole lot for a single “lesson”.

And the more you have before you start accomplishing something of substance, the harder it will be to keep your audience.[/QUOTE]

It’s not that much more that you see in a openGL 4.+ lesson 1

Though using a library that takes care of the initialization, glsl to spir-v compilation and basic DMA (like a set of commands that mimics the asyncIO primitives) will probably a major part of a beginners tutorial.

So how do you maintain visibility for memory changes? That’s what the barrier is for (assuming that barriers work like they do in OpenGL); without, if you upload some data and wait for it to finish, you can’t necessarily know the appropriate GPU caches have been cleared. That’s what the barrier does.

If the point of teaching X before Y is that Y uses X and therefore requires X to be known, then X isn’t optional. Same goes here: if GPU devices, queues, and so forth are used by lesson two (and they are), then learning about them is not optional.

Well, let’s find out:

OpenGL: OpenGL context, OpenGL objects (unless you’re DSA-only), buffer objects, vertex array objects, shaders & compilation, data flow through the OpenGL pipeline, and framebuffer swapping. (7)

Vulkan: GPU device, GPU queues, buffers, memory objects & access, memory visibility command buffers, WIS, render targets & render passes, pipeline state, vertex array state, shaders & compilation, data flow through the Vulkan pipeline, and command buffer synchronization. (12)

I’m not sure how almost double the material counts as “not that much more”. And I could even pad out the Vulkan list by listing all of the state that goes into the pipeline state object, since you must specify all of it explicitly (unlike OpenGL, where you can leave default values for most of it).

Then, are they learning Vulkan or your library? The GLSL-to-SPIR-V stuff (or some other X-to-SPIR-V system) would be used by any Vulkan programmer, so that doesn’t count as what I’m talking about. But unlike OpenGL, initialization Vulkan is actually part of Vulkan’s API. So if you wrap it in some library, you’re teaching your library, not Vulkan. Oh, you can wrap the platform-specific window creation parts. But the actual creation of a Vulkan instance? That’s part of Vulkan.

If you wrap Vulkan’s DMA system behind a set of library functions or whatever, then you’re not teaching the user Vulkan. You’re teaching them your library.

And if the most effective way to teach an API is to hide it, I’d say that you’ve arrived at a contradiction. Which strongly suggests that your initial assumption, that Vulkan is appropriate for beginner learning materials, is not correct.

I’m not going from the assumption that vulkan is appropriate for beginners but I’m trying to design a beginners tutorial regardless of that fact.

[QUOTE=Alfonse Reinheart;31151]So how do you maintain visibility for memory changes? That’s what the barrier is for (assuming that barriers work like they do in OpenGL); without, if you upload some data and wait for it to finish, you can’t necessarily know the appropriate GPU caches have been cleared. That’s what the barrier does.
[/QUOTE]

I am making the assumption that between successive command buffer executions on the same queue there is a implicit barrier. As in the commands on the same queue see a consistent memory. And a DMA followed by a render using that DMAed buffer will see it completed before starting the render. The same with waiting on the fence, You wait on the fence the DMA is finished and you can treat the DMAed memory on gpu side as finished. Otherwise the fence is completely useless.

Technically you need VAO in openGL 4.0 core but you can just create&bind one and then pretend it doesn’t exist. Some tutorials will do so, Fairly certain some vulkan stuff can be treated the same way.

[QUOTE=Alfonse Reinheart;31151]
Then, are they learning Vulkan or your library? The GLSL-to-SPIR-V stuff (or some other X-to-SPIR-V system) would be used by any Vulkan programmer, so that doesn’t count as what I’m talking about. But unlike OpenGL, initialization Vulkan is actually part of Vulkan’s API. So if you wrap it in some library, you’re teaching your library, not Vulkan. Oh, you can wrap the platform-specific window creation parts. But the actual creation of a Vulkan instance? That’s part of Vulkan.

If you wrap Vulkan’s DMA system behind a set of library functions or whatever, then you’re not teaching the user Vulkan. You’re teaching them your library.

And if the most effective way to teach an API is to hide it, I’d say that you’ve arrived at a contradiction. Which strongly suggests that your initial assumption, that Vulkan is appropriate for beginner learning materials, is not correct.[/QUOTE]

You now have people learning SDL and GLUT. You can also create the DMA wrapper yourself and create emulated glBufferSubData functions

Given that you’re doing something that’s not likely to succeed, here’s some advice to make the best of a bad situation. Take it or leave it as you like.

  1. Start with clearing the screen. It’s not as good as a triangle, but it at least has some value the user can play around with, while avoiding the stickier elements (memory allocation, management, DMAs, synchronization & fences, the rendering pipeline, etc). If you need to establish a pipeline state to clear the screen, avoid talking about it in any depth.

The main thing about this is that it exercises all of the primary components of the Vulkan API: the instance, GPU devices, GPU queues, command buffers, command buffer execution, the WIS, the WIS presentation image, and render passes/render targets (assuming clearing an image requires that). The reader gets a general overview of these components, but without a lot of the details that will be forthcoming.

  1. Follow this up with a giant wall of text, covering concurrency, memory visibility, and an overview of the rendering pipeline (including shaders). This should have zero code; it’s about introducing the reader to those concepts.

  2. Now, you get to Hello, Triangle. Here, you have to deal with buffers, memory allocation, memory transfer, shaders, pipeline state, and render passes (assuming clearing didn’t require this). However, you also need to build on the prior topics. So attack concurrency in memory transfers head-on. Also, show how data flows down the pipeline and is processed.

  3. Animating the triangle, along with multiple buffering. This expands on concurrency issues, as you have to deal with the fact that multiple rendering sequences may be in flight simultaneously. But it also requires creating internal image buffers yourself.

You assume that the DMA and rendering calls will be on the same queue. But we know from the presentation that they don’t have to be. There are several types of GPU queues: compute, rendering, and DMA. And while the slides do say that a particular queue can support multiple kinds of operations, they don’t say that this is required. Some hardware could have a separate DMA engine from its rendering engine, and therefore your application must send DMA commands via the other queue. Some hardware may not have separate queues, so for them, DMA commands must go into the same queue as rendering commands.

Your software has to be designed to work with both cases.

So, your code is going to have to introduce both possibilities, as well as show how to handle them.

Also, I would be very hesitant about assuming that starting a new command buffer is the equivalent of a barrier. It could be, but that would also introduce needless overhead for command buffer executions that don’t need that barrier. I’m not saying that the assumption is definitely wrong, only that it’s probably not the choice that a low-level API like Vulkan would make.

Lastly, waiting on a fence has nothing to do with memory visibility. This is one of the reasons why you have to know so much about concurrency to understand Vulkan.

In concurrent systems (of any kind), if “thread” 1 wants to read a value that thread 2 is going to write, then clearly thread 1 cannot try to read that value until thread 2 has written it. In a language like C++, you would use a mutex or similar construct to ensure this. Thread 2 writes the data, then releases a lock. Thread 1 acquires the lock, which is only possible if thread 2 released it, and reads the written value.

However, if you look lower level than this, you find that C++ and similar APIs have actually put two different tasks into one. At the level of the CPU, caches confound the hardware. Since each core has its own private caches, if Thread 2 wrote a value, that value may not be in a place that Thread 1 can read by the time it gets to reading it. So even if you ensure ordering of the write/read, you haven’t yet ensured visibility of the written value.

That’s what memory barriers are all about. Mutex’s hide the need for this by issuing a memory barrier internally. I highly doubt that Vulkan fences will, since OpenGL fences don’t. I seriously doubt that Vulkan would make their fences more heavyweight than OpenGL’s, for several reasons.

The most important reason for lightweight fences is that GPUs have lots of caches. And you don’t want a fence to invalidate all of them; the user’s particular use case may only need to invalidate certain kinds. So you force the user to be specific about which ones.

These are selected by the memory barrier command.

Fences are not “completely useless” just because they don’t include the barrier. They are useful for many things; just not for the one you’re trying to do here :wink:

The reason OpenGL 4.x allows you to ignore the existence of VAOs is because of its “bind-to-edit” rules. You’re modifying an object’s state without knowing about it. We know Vulkan doesn’t include that nonsense (nor should it), so you can’t work with their vertex format state object without knowing that it exists.

While I have no definitive evidence for what I’m about to say, I do have some circumstantial evidence for it.

Here are the basic facts. The pipeline state object is said to be immutable upon creation. But at creation time, the user can say that specific of it are allowed to be mutable. This mutation happens by binding small “dynamic” state object to the pipeline.

So the question is: are these small “dynamic” objects themselves mutable? Or are they immutable, like D3D11’s state objects?

My guess is that they’re immutable. This is based primarily on the fact that the slides keep talking about how “immutable” batches of work are good for the GPU. Plus, I’m pretty sure Mantle worked based on immutable state objects.

If Vulkan’s state objects are immutable, then what you’re suggesting wouldn’t generally be possible. To change vertex formats, you’d need an object that specified that particular vertex format. And while you could create these initially and keep them around somewhere, you couldn’t really ignore the need to create it the way you can with VAOs. If you want to talk about what vertex formats mean, you have to talk about the object that encapsulates that state. Which means talking about the creation of it.

GLUT does not wrap a single OpenGL call. It only wraps OpenGL initialization, which is explicitly not part of the OpenGL API.

This is a technicality to be sure, but its a very important one.

All that cache coherency is going to depend on how much of that vulkan will expose, I don’t see that much of it happening given that command buffers are meant to be large draw calls where cache management can be optimized by the driver during construct time.

I expect that between buffer executions caches can be considered cleared/flushed and ready for the next command buffer to conform to the “no dependencies between command buffers” mentioned in the presentation.

I thought the whole point of Vulkan is that the driver does not insert slow synchronization points outside of the application’s control.

implicit synchronization between consecutive executions of command buffers in the same queue won’t slow it down. Like in the same thread you don’t have to jump through cache coherency hoops just to get the expected output or in other words you don’t have to worry about pipelined execution in assembly as the compiler will insert the no-ops needed for the expected results. The same will probably be true for the queues as the driver will ensure the coherency of commands in one queue and is allowed to optimize the commands per buffer “offline” during creation but will have to insert a barrier between buffer executions.

I expect that strictly speaking a queue that doesn’t emit fences/barriers (and/or no-one waits on them) is allowed to remain idle and not start executing its submitted commands strictly speaking.

But you do have to “jump through cache coherency hoops”. In threading systems, whether C11, C++11, Win32 threads, pthread, etc, all of these systems make you “jump through cache coherency hoops”. It just hides them behind a mutex lock/unlock function calls. They’re still there; the CPU requires them.

And if you’re aiming for low-level control, you don’t want to have some automated system do that for you. Because you might not need them (at that particular point).

For example, let’s say you are rendering a shadow map in command buffer A. And you have command buffer B which will use that shadow map to render the actual scene. Obviously, you must issue B after A. Furthermore, command buffer B needs to ensure coherency and ordering with all commands in A. Therefore, command buffer B will start with an appropriate memory barrier.

But it will also need to start with a synchronization. That is, it must say, “you cannot proceed until all commands from command buffer A have completed.” Well, that’s kinda bad, performance-wise. Because now, the GPU will have a full pipeline stall between A and B.

However, you can alleviate that by putting some distance between A and B. So let’s say we have command buffer C, which doesn’t use the shadow map (maybe it’s the UI or something). So logically, you’d issue these buffers in A-C-B order. Oh, you still need the synchronization. But because it’s based on when command buffer A completes (rather than simply all commands issued before now), odds are good that there won’t be a pipeline stall at all.

Here’s the problem. If Vulkan is as you claim it to be, the order to execute buffer C will effectively issue a full memory barrier, with regard to prior command buffers. But C doesn’t share any data whatsoever with A; it has absolutely no need for these memory barriers. Therefore, emitting them does nothing more than needlessly impairing the performance of C.

Your way causes the emission of two memory barriers (after A and after C), when the only one that is necessary is the one after C. That’s throwing away performance needlessly.

And I hear-tell that Vulkan’s all not doing that :wink:

Low-level APIs are low-level. That entails a lot of responsibility on the writer of code. That’s the price of performance. According to a rep from LunarG, “We heard the comment that writing a Vulkan backend for an engine is pretty much like writing an OpenGL driver that only does what you want it to do.”

Having the API emit potentially-needless barriers for you doesn’t sound much like that.

[QUOTE=Alfonse Reinheart;31167]But you do have to “jump through cache coherency hoops”. In threading systems, whether C11, C++11, Win32 threads, pthread, etc, all of these systems make you “jump through cache coherency hoops”. It just hides them behind a mutex lock/unlock function calls. They’re still there; the CPU requires them.

And if you’re aiming for low-level control, you don’t want to have some automated system do that for you. Because you might not need them (at that particular point).
[/QUOTE]

That is only when you are using multiple threads, a single threaded application doesn’t use any of that. So in the hypothetical single general queue scenario you don’t have to use that either.

[QUOTE=Alfonse Reinheart;31167]
For example, let’s say you are rendering a shadow map in command buffer A. And you have command buffer B which will use that shadow map to render the actual scene. Obviously, you must issue B after A. Furthermore, command buffer B needs to ensure coherency and ordering with all commands in A. Therefore, command buffer B will start with an appropriate memory barrier.

But it will also need to start with a synchronization. That is, it must say, “you cannot proceed until all commands from command buffer A have completed.” Well, that’s kinda bad, performance-wise. Because now, the GPU will have a full pipeline stall between A and B.

However, you can alleviate that by putting some distance between A and B. So let’s say we have command buffer C, which doesn’t use the shadow map (maybe it’s the UI or something). So logically, you’d issue these buffers in A-C-B order. Oh, you still need the synchronization. But because it’s based on when command buffer A completes (rather than simply all commands issued before now), odds are good that there won’t be a pipeline stall at all.

Here’s the problem. If Vulkan is as you claim it to be, the order to execute buffer C will effectively issue a full memory barrier, with regard to prior command buffers. But C doesn’t share any data whatsoever with A; it has absolutely no need for these memory barriers. Therefore, emitting them does nothing more than needlessly impairing the performance of C.

Your way causes the emission of two memory barriers (after A and after C), when the only one that is necessary is the one after C. That’s throwing away performance needlessly.

And I hear-tell that Vulkan’s all not doing that :wink:

Low-level APIs are low-level. That entails a lot of responsibility on the writer of code. That’s the price of performance. According to a rep from LunarG, “We heard the comment that writing a Vulkan backend for an engine is pretty much like writing an OpenGL driver that only does what you want it to do.”

Having the API emit potentially-needless barriers for you doesn’t sound much like that.[/QUOTE]

They also said that drivers aren’t allowed to assume any order between buffer executions. That said drivers will be able to check which buffers are being used in the current queue and the submitted commands by checking the active commandbuffer’s read/write needs and insert barriers as needed to ensure coherency. So in the second example (A-C-B) the driver will auto emit a barrier after A (to make coherent the writes) and a wait for it before B.

If there is only a single rasterizer unit in the GPU then you need a (partial) pipeline stall between commandbuffers that use it. And interleave the commands of multiple queues that use it (with stalls inbetween).