Layering APIs on top of Vulkan

It came up in another thread, but I wanted to talk specifically about this notion.

Vulkan is a very low-level API. And while it will probably be useful in that regard, not everyone really needs that low-level nature. On the other hand, because Vulkan is so low-level, Vulkan drivers won’t be making very many decisions. So their behavior will be generally predictable, with far fewer of the bugs that plague, for example, OpenGL implementations.

So Vulkan makes for a good API to build other “higher-level-but-still-not-scene-graph-level” graphics APIs out of. This is similar to how SPIR-V works as a language that you build other languages to compile to,. The language you make could be Java, C#, Python, or even Fortran. Or it could be a simple, low-level SPIR-V “assembly” language that abstracts away register assignment and type declarations, but nothing more than that.

So here are some possibilities I was considering:

Apple Metal

Yes, really. Of the 4 “next-gen” graphics APIs, it is the only one I would consider to not be derived specifically from Mantle. It doesn’t Mantle-style descriptor sets for one. But more importantly, it acts at a higher level of abstraction. It’s still well below OpenGL; it has explicit command queues, buffers, and its very much asynchronous.

But it manages memory for objects in an OpenGL fashion. Metal memory objects are inexorably tied to the uses of that memory; buffers and texture own their memory rather than rent them. Metal has no DMA queue or equivalent. It certainly has nothing like Mantle’s complex memory state system. In short, Metal abstracts memory resources much more than Mantle/D3D12/Vulkan.

And that makes the API much easier to use, for those for whom Vulkan-like memory management is nothing more than an annoyance.

Plus, since Metal is an actual API that would be independent of a Metal-on-Vulkan implementation, there’s more incentive to use a Vulkan implementation. Plus, because Metal’s abstractions are mostly identical to Vulkan’s, you’d be unlikely to lose any significant performance from it.

The downside of course is that Metal is an Objective-C interface. So you’d either be writing a C+±ified version of it, or you’d limit yourself and your users to Objective-C.

You’d also need a Metal Shading Language to SPIR-V compiler.

MiniGL

I mentioned this idea in the other thread, but this bears some discussion. The idea is to take modern OpenGL (plus potential changes in 4.6/5.0/whatever the next version is) and strip out all of the redundant stuff (non-DSA object modification and querying). Also, take out things that are hard to implement or are potentially slow (such as forcing the driver to take arbitrary pixel formats).

What you’d be left with would be a solid, reasonable API.

One nice thing about this is that, since it’s OpenGL minus stuff, it would still be backwards compatible with a full OpenGL implementation. So if performance is something you need, you could avoid the likely weak performance from a Vulkan implementation and go with a full OpenGL driver. But on the plus side, if performance is not a primary concern for you, then the fact that it’d be an open source project means that you can fix bugs in it at your leisure.

You could even go in and make changes to improve performance for your particular application.

Longs Peak

This was the “nicer, cleaner” OpenGL API that the ARB attempted to invent before 2008. Writing it would be interesting from some standpoint. It would be a clean, cross-platform, immediate-mode-style API.

The downside here is that Longs Peak never existed as an API. The only “documentation” on it are some preliminary PDFs with some basic examples of how to construct objects. As such, even if someone liked using it, it would be functionally no different from you creating your own API.

So why limit yourself to Longs Peak, when you could potentially go even farther? Like not basing the API on C, for example. Or giving it explicit command buffers, ala NV_command_list. Once you’ve stopped limiting yourself to APIs that currently exist, why restrict yourself to OpenGL-style APIs?

So what kind of API would make for a useful abstraction above Vulkan, but is still lower level than a real scene graph? What ideas do you have?

One decision that has to be made is how leaky the API will be, is the programmer allowed to bypass the API and talk to vulkan (or metal/D3D12 if it so happens to get ported) directly.

Thing that would be nice to have in the API would be pre-built pipeline states. Part of what makes openGL performance so unpredictable when you change some little thing (like vertex layout) and the driver may need to recompile the program. Making it explicit when you change the pipeline lets it be known that it’s expensive.

The second part is the manual memory management, Vulkan’s is a bit too low level for the level you are envisioning. I can see the level of ARB_buffer_storage (you allocate a buffer of size X which may or may not be CPU-mappable and it remains allocated until you delete it) being around the level that’s acceptable.

Rendering should be asynchronous (duh!!) as in you submit commands on a queue that executes them one at a time (memory barriers get inserted automatically as needed). With the option of building command lists that you can batch up.

Back-dooring any API is usually exceedingly dangerous. I doubt Vulkan is going to have a robust state querying system like OpenGL (Mantle certainly doesn’t). So if someone’s playing with memory object state (or the Vulkan equivalent) behind the API’s back, there’s no way to know what it did or how to readjust itself based on the change.

OpenGL is quite explicit about when you “change the pipeline”. The only API issue in that regard is the whole “bind to edit” thing, which looks much like “bind to render”.

Also, I don’t recall any situation where an OpenGL implementation performs a full recompile of a program due to vertex layout changes. I know AMD’s implementations don’t actually have vertex pulling hardware, so their VAO internals build a bit of vertex shader code that they prepend to the existing vertex shader. But changing VAOs is still far from a full-on vertex shader recompilation. For one, I would expect that they not do things like dead-code-elimination and so forth.

Now, there have been cases in the past where uniform changes would invoke a shader recompile. That tended to be NVIDIA, though.

That wasn’t the form of “asynchronous” I was referring to. Yes, the GPU executes commands on its own time; even OpenGL or Ogre3D does that. What I was referring to was the memory and execution model. Who needs to take care of things if you issue a read/modify/write?

OpenGL presents a synchronous view of execution. But it cheats; it makes things appear synchronous while in reality allowing things to be as asynchronous as you allow. All of the next-gen APIs present a fully asynchronous view of execution; all synchronization must happen explicitly at the request of the user.

And it should be noted that the lack of an asynchronous execution model is why D3D11 command lists don’t work very well at their intended purpose. Different threads have to talk to the same objects, so there is a lot of synchronization going on. Also, command lists have to carry around a huge amount of metadata about which objects they work with and in what ways. Then, when executing that list, the driver has to go through that metadata and, using what it knows about what has previously been executed, issue appropriate memory barriers and the like…

[QUOTE=Alfonse Reinheart;31515]Back-dooring any API is usually exceedingly dangerous. I doubt Vulkan is going to have a robust state querying system like OpenGL (Mantle certainly doesn’t). So if someone’s playing with memory object state (or the Vulkan equivalent) behind the API’s back, there’s no way to know what it did or how to readjust itself based on the change.
[/QUOTE]
Fully opaque to the underlying API then.

Changing blending state will sometimes need a redo of the fragment shader.

So DSA all the way for all editing of state (or just no global state whatsoever).

[QUOTE=Alfonse Reinheart;31515]
That wasn’t the form of “asynchronous” I was referring to. Yes, the GPU executes commands on its own time; even OpenGL or Ogre3D does that. What I was referring to was the memory and execution model. Who needs to take care of things if you issue a read/modify/write?

OpenGL presents a synchronous view of execution. But it cheats; it makes things appear synchronous while in reality allowing things to be as asynchronous as you allow. All of the next-gen APIs present a fully asynchronous view of execution; all synchronization must happen explicitly at the request of the user.

And it should be noted that the lack of an asynchronous execution model is why D3D11 command lists don’t work very well at their intended purpose. Different threads have to talk to the same objects, so there is a lot of synchronization going on. Also, command lists have to carry around a huge amount of metadata about which objects they work with and in what ways. Then, when executing that list, the driver has to go through that metadata and, using what it knows about what has previously been executed, issue appropriate memory barriers and the like…[/QUOTE]

Then there should either be double buffering or exposing the asynchronous nature to the user. I believe that it should be a mix, don’t touch the used buffers while a render is in progress but you can change the pipeline object for the next render. Fences will then be needed to signal the user that the buffer isn’t used anymore. It will be similar enough to asyncIO that some users are already used to. You should still be able to submit multiple commands all using the same buffer but from the first command’s submission until the fence triggers on the last you are not allowed to touch that buffer.

The API should be allowed to batch and optimize commands it finds that are commonly used together (like a set of DMAs followed by a render). But if you want to avoid the slowdown of the batching process you should explicitly create command buffers.

[QUOTE=Alfonse Reinheart;31513]It came up in another thread, but I wanted to talk specifically about this notion.

Yes, really. Of the 4 “next-gen” graphics APIs, it is the only one I would consider to not be derived specifically from Mantle. It doesn’t Mantle-style descriptor sets for one. But more importantly, it acts at a higher level of abstraction. It’s still well below OpenGL; it has explicit command queues, buffers, and its very much asynchronous.

[/QUOTE]

This is going to be a bit “you say tomato, I say tomahto” but…

You say metal works at a higher level of abstraction.

I say metal works a the same level of abstraction on a much simpler piece of hardware. :slight_smile:

Because Metal -only- runs on a shared memory device, there’s no need for the complications that Mantle (and possibly Vulkan) put developers through managing multiple memory pools with different characteristics. There’s no DMA because you don’t need to DMA - VRAM is system memory.

(There is one glaring exception I’m leaving out - resource transitions abstract cache flushes - barely; since Mantle doesn’t have this, we have to assume that the cache flushes are implicit and are detected from specific points in the call stream to Metal.)

So Metal on Vulkan would be more abstract than Metal on an IOS device; the metal implementation would have to look at the API and manage memory pools and DMAs for all Metal objects. This is work that isn’t happening at all on iOS - on Metal I allocate an object and it just sits in memory, and I’m done.

Similarly, an author coding to metal can assume that accessing the memory backing resources on metal is going to be as cheap as you’d expect in a shared memory, because the API basically promises a shared memory model (inherited from its one and only platform) 100% of the time.

cheers
Ben

I’ve just realized you’re not bound to graphics API. One could write the open-source implementation of OpenCL, for example. Most of the things from OpenCL 1.2 are likely to be straight-forward to make. The implementor will only need to figure out the memory management and synchronization. Device-side dispatching from 2.0 is probably a hardware feature, for the most part. Less so, SVM. In the worst case, one would need to change SPIR-V code to the format Vulkan compute shaders expect (or simply use HLSL instead of CL C).

On a side note, compute part of the Vulkan is probably what tutorials for beginners should start with. It will allow to cover memory state transitions, memory allocation and synchronization without a need to hide essential parts behind magical helper functions. One of possible lessons could be to generate a BMP file with lots of colourful circles by directly writing pixels into a buffer.

No, you cannot.

The consumer of a SPIR-V shader provides a specific set of language capabilities. Vulkan implementations will, almost certainly, not offer the “Kernel” capability. When means that the addressing model is restricted to Logical. That means you don’t get pointers, and OpenCL is kinda built around having those.

Hmm… that is an interesting way to start. In effect, you do graphics without employing the “rendering” pipeline at all.

Granted, you still need to cover a lot of stuff to get even that far. But doing it this way means that you’re able to do something of value, while postponing any discussion of descriptor sets/layouts, PSOs and the various dynamic state, the entire rendering pipeline, and perhaps even WIS until later.

Now the question is whether you can make graphics interesting enough, exclusively through compute, to last for a couple of tutorials (as it would take at least 2 before the user is even semi-comfortable with this sort of stuff).

[QUOTE=Alfonse Reinheart;37643]
Now the question is whether you can make graphics interesting enough, exclusively through compute, to last for a couple of tutorials (as it would take at least 2 before the user is even semi-comfortable with this sort of stuff).[/QUOTE]
That’s how I see this.
Lesson 1:
Goal: to understand a general framework of Vulkan. It’s alright to use vkFinish-like functions and not explain them in depth, since it is not really important for now. Helper functions will be used for context initialization and shader compilation.

Initialize Vulkan runtime and a GPU.
Create a compute queue, allocate a memory page in Host-accessible device memory. Create a shader that fills a buffer with thread_ids.
Create a huge command buffer which will make all of manipulations required in this sample.
Dispatch it.
Map the buffer into host memory, printf it.
Clean up job.

Homework: alter the shader in such way it writes thread_id squared instead.

Lesson 2:
Goal: In depth explanation on how resource attachment works.
Create a shader:
DrawEllipse(int4 offset_and_radiuses, int3 color, int3_buffer output_image){
if point (thread_id.x, thread_id.y) is in ellipse (offset_and_radiuses)
output_image[thread_id] = color;
}

Final application will let user to type indefinite number of circles to draw. After each draw, it maps the resulting image into host memory and saves it into BMP or another elementary image format.
Homework: create a shader for drawing of rectangles, learn to switch between different shaders.

Lesson 3:
In depth synchronization.

Move image buffer into unmappable area. Explain why it is important.
Create another queue. Use it to issue DMA mechanism instead of mapping as in previous lesson.
Use fences so DMA-queue stalls until draw-queue in done drawing and vice-versa. Explain why explicit synch is bad.

Homework:
Remove all vkFinish calls using fences.
Play with different heaps, try using fences to test the application perfomance when different memory areas are used.

Sitting on my armchair, I believe, any programmer who at least read Tannenbaum’s Computer Architecture will know what it takes to proceed to actual graphics after this lessons. If there are people with more thoughts on what Vulkan tutorial must include, perhaps we could create another topic for discussion.

See my response in the other thread.

I like the MiniGL option. Having said that, another approach would be to write Gallium3D Pipe Driver targeting Vulkan. That would give all APIs supported by Mesa for a lot less work than a direct wrapper. I wrote about this elsewhere and gave links so I won’t go on here.