macOS MoltenVK basic physical device features and geometryShader

Hi I recently started learning Vulkan, now that MoltenVK is open source and I can program for Vulkan on my MacBook Pro 2017.

I’ve been following
and up until this point it has been pretty simple understanding things. I’m now trying to retrieve the physical devices and test them for the needed features. My MacBook Pro 2017 has 2 GPUs in it, Intel HD Graphics 630 (Integrated) and Radeon Pro 560 (Dedicated/Discrete).

Following the above tutorial it checks if a GPU is VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU and that the device’s features has the geometryShader flag.
Now I knew the Intel would fail the check because it’s Integrated, but the Radeon is failing on the geometryShader check. Is this a know issue with the current AMD driver’s on macOS and Vulkan at the moment?
I don’t know much about graphics (hence the learning) but I would expect a card such as the 560 Pro (one of the newest card in MacBooks) to support geometry shading, especially as the tutorial has this line of code.

// Application can’t function without geometry shaders
if (!deviceFeatures.geometryShader)
return 0;

I’m assuming it a super basic feature no?

I did some searches on google and on this forum and could find anyone reporting a similar issue.
Any help would be appreciated.

Oh I should add if I kept on reading the tutorial is says “Because we’re just starting out, Vulkan support is the only thing we need and therefore we’ll settle for just any GPU:”… but I would still like to know if geometryShader is consider basic functionality? Or if it’s something I’m likely to need soon after drawing a basic triangle.

MoltenVK does not currently support geometry shaders:

I think neither does the underlying Metal, so it makes some sense.

Generally speaking, most people struggle to find a use for Geometry Shaders. That’s not to say that they’re completely worthless, but the majority of their use cases have been superseded by other features: tessellation shaders (which admittedly Metal/MoltenVK also doesn’t support), compute shaders, and the ability to do layered rendering from vertex shaders.

There just isn’t much space left for problems that require GS-based solutions.

Yeah after poking around more online and chatting to a gfx programming friend, I realize GS often doesn’t have much use, and it was just a bad example to use in a tutorial imo… I’ll definitely be reading all the way to the end before implementing the code from that site in future… Thanks for all the replies.

But it was just an example of how to write your application so that it requires a specific feature(s) of the GPU. It wasn’t suggesting that GS’s were an essential feature of applications. Hence it saying “let’s say we consider our application only usable for dedicated graphics cards that support geometry shaders.”

Oh yeah I realize this now after fully reading that tutorial page, it’s just until that stage of the tutorial all the code written in the previous stages had been used, so I was more focusing on the code and making sure it worked, and didn’t realize it was going to be throw away code, which is why I was saying it maybe wasn’t a great example for a tutorial. I would imagine most people don’t follow a tutorial that’s trying to teach you the basics of something only to expect you would then immediately throw that code away.

An important element of Geometry Shaders people struggle to emulate in Compute Shaders is Streaming Output Buffers - i.e. the ability to incrementally spit out geometry for rendering without accumulating all the output geometry data into a single large buffer object.

When we use Geometry Shaders, we create buffers of input geometry (direct or indexed), and issue a draw call. The geometry shader operates on each input primitive, and produces one or more output triangles, with potentially unique transformed coordinates. Those output primitives are streamed into the next pipeline stage.

If our draw call has 50,000 triangles, we hope Geometry Shaders don’t create a wasteful (50,000 * GL_MAX_GEOMETRY_OUTPUT_VERTICES) output buffer from the geometry shader. We hope they accumulate primitives in a buffer appropriate to the GPU thread-dispatch width, and dispatch it whenever it fills. (though technically we don’t know what they do)

However, a simple Compute Shader implementation allocates an output buffer big enough to hold all results from the entire compute batch. This is because the compute shader is run to completion, then a draw call is issued with the output buffer.

If our drawing batch has 50,000 triangles, and each Compute Shader instance creates one triangle, our output buffer has to have space for 50,000 triangles. If each instances creates multiple triangles, this output buffer is (50,000 * max_geometry_per_cs_instance)! And this problem is amplified as command-buffer drawing allows us to put more and more work into a single draw call.

If we want to avoid allocating this output buffer on the fly, we pre-allocate the largest one we need, and then if it’s use we have either wait for a batch to finish, or we need more than one of them. This is a real inconvenience that Geometry Shaders free us from.

Two examples of this are the Geometry Shaders in Parallel Split Shadow Maps and Voxel Global Illumination. In PSSM, a geometry shader is used to clone geometry into the appropriate shadow-map splits (normally 3). In VXGI, a geometry shader is used to decide which projection axis makes the triangle coverage the largest in the 3d clipmap texture, and projects it onto this axis.

Whether using simple geometry buffers or Metal Indirect Command Buffers, all examples I can find still allocate and store a buffer for the whole compute call.

However, what do you do when your Compute Shaders are doing work on the input data, and producing completely unique output data, such as in PSSM or VXGI, that you don’t want to pre-allocate and store?

Is there some way in Metal2 to have compute shaders directly issue streaming drawing commands (without CPU involvement) in temporary buffers that are proportional to GPU thread-width, not the full drawing batch size of each object? If so, how?

One way I can see to do something remotely similar with Compute Shaders is to manually split the draw-batch into sub-batches. For example, we could take a 50,000 primitive object, and subdivide it into multiple 2,000 primitive compute batch calls. When we issue a single compute batch, it now only needs a (2,000 * generated_primitives) output buffer. Each time a compute call finishes, we send the sub-batch output to drawing, and simultaneously issue the next compute batch. This only requires two output buffers, which we ping-pong. I do worry about CPU involvement and GPU stalls, but it might be better than allocating and consuming large temporary buffers.

It’s important to remember that this thread is about MoltenVK, which runs on iOS hardware. And iOS is designed specifically for tile-based rendering hardware, which is hardware that typically wants to create a firm separation between the act of vertex generation and the act of rasterization. That is, they already buffer lots and lots of vertices (ideally, all of them) before trying to render tiles.

Now that being said, it would be useful to be able to have a compute shader write vertex data directly to the buffers that the TBR rasterizer uses, so that you don’t have to have a pass-through VS that just copies data from one place to another.