Missing features: Transform Feedback and Spir-V DrawID ?

ParticlePeter · March 9, 2016, 9:07am

I cannot find any information about Transform Feedback, do I miss something obvious? I found the combination with Geometry Shaders quite useful, is there some other more general feature like DX11 append/consumeBuffers which I am missing?

I am also quite surprised that ARB_shader_draw_parameters didn’t make it into core spec, very useful for accessing indexed resources. What would be the reasoning for that?

Alfonse_Reinheart · March 9, 2016, 9:16am

I cannot find any information about Transform Feedback, do I miss something obvious? I found the combination with Geometry Shaders quite useful, is there some other more general feature like DX11 append/consumeBuffers which I am missing?

Yes. It’s called “doing it yourself”.

There’s nothing stopping you from having the GS write data to a storage buffer or image. You can even use the PrimitiveId and InvocationId to decide where each invocation should write to.

Vulkan doesn’t need transform feedback. And OpenGL hasn’t needed it since GL 4.2.

I am also quite surprised that ARB_shader_draw_parameters didn’t make it into core spec, very useful for accessing indexed resources. What would be the reasoning for that?

Probably the same reason that it didn’t make it into core OpenGL, even though it was released alongside OpenGL 4.5.

ParticlePeter · March 9, 2016, 12:39pm

[QUOTE=Alfonse Reinheart;39940]Yes. It’s called “doing it yourself”.

There’s nothing stopping you from having the GS write data to a storage buffer or image. You can even use the PrimitiveId and InvocationId to decide where each invocation should write to.[/QUOTE]

Sorry I can’t follow. I don’t how to remove arbitrary Elements from a buffer and compute a new contiguous buffer from the remaining Elements, all in parallel execution. And I don’t see how PrimitiveID and InvocationID could help, maybe you can point me to your source of knowledge?

That’s an opinion, like “the world doesn’t need c++”, which I am really convinced of but wouldn’t reply it to a post in search for reason. But still, good to know that there are different People with different angles.

Aaaah, THAT reason … hm … still can’t follow, sorry. In fact I am missing it in 4.5 as well. Maybe somebody else with another (sound) reason?

Alfonse_Reinheart · March 9, 2016, 1:21pm

I don’t how to remove arbitrary Elements from a buffer and compute a new contiguous buffer from the remaining Elements, all in parallel execution.

OK, but that’s not what transform feedback does, so I don’t see how that matters. As I’m sure you’re aware, transform feedback is about marking outputs from a GS to be recorded into a buffer(s). This can be very easily done from a geometry shader.

It has nothing to do with removing any “Elements” from a buffer.

And I don’t see how PrimitiveID and InvocationID could help, maybe you can point me to your source of knowledge?

The source of my knowledge is thinking about the tools Vulkan provides.

OK, every time you call EmitVertex in a GS that’s using transform feedback, the implementation takes the data in your output variables and writes them to the various buffers bound for feedback. That’s something that you can do yourself by writing to an SSBO. The only issue is figuring out where in the SSBO’s array to write to. This will be based on your particular invocation, relative to other GS invocations in the rendering command.

That’s where PrimitiveID comes in. If your GS writes 3 vertices in each GS invocation, then you know that the invocation for PrimitiveID 0 will write to locations 0, 1, and 2 in the SSBO array. PrimitiveID 3’s invocation will write to locations 9, 10, and 11. And so forth.

If you’re instancing your GS, then you need to multiply PrimitiveID with InvocationID.

Now, if you’re conditionally writing with your GS, then we need to talk a bit more about what data you’re conditionally writing. If the data you’re writing is unordered (that is, it doesn’t really matter what order you write i in, so long as it’s tightly packed. This is common for frustum culling), then you can use an atomic counter to compute the index. Each invocation increments the atomic counter if it is going to write data.

Of course, if you don’t need order, then you ought to be using a compute shader. Don’t pretend frustum culling is a rendering operation.

And if we’re going to be fair, I did think of one case where transform feedback would be needed, but it’s hardware specific. It would be for hardware that is incapable of doing SSBO writes from any shader stages other than fragment and compute, but is otherwise would be capable of TF. And it would seem thus far that no mobile hardware allows it (nor does most Intel hardware).

At the same time, I don’t think they should add hardware features that exist solely because of hardware that will be increasingly outdated. TF is a wart in the OpenGL API because you can do the job yourself.

Aaaah, THAT reason … hm … still can’t follow, sorry. In fact I am missing it in 4.5 as well. Maybe somebody else with another (sound) reason?

It’s the usual reason why ARB extensions aren’t instantly part of core OpenGL. It’s why ARB_sparse_texture and ARB_bindless_texture never made it to core: because not all 4.x hardware can support it.

Shader draw parameters requires that an implementation be able to provide 3 pieces of data to the VS:

The base instance value. Since Vulkan uses InstanceIndex instead of InstanceID, this is unnecessary. The InstanceIndex has the base instance added to it already, so the shader doesn’t need a separate value for the base instance.
The base vertex index. Again, since VertexIndex already includes the base vertex index, there’s no need for a separate value.
A counter that increments for each drawing command. That one requires hardware support, since it must also work via vkCmdMultiDrawIndirect.

So Vulkan already has 2/3rds of the data. And the last 1/3rd requires direct hardware support. Admittedly, they could have added that as an optional feature, but at some point, you’ve got to ship the thing out the door. DrawID alone isn’t all that important.

Especially since Vulkan only permits multidraw operations with indirect rendering commands. So if you need the equivalent of DrawID, you can always use the base instance if the hardware allows it.

ParticlePeter · March 10, 2016, 12:53pm

Thanks, that is a quite reasonable answer, I lost track of the fact that mobile IHVs are on-board and might lag hardware capabilities for the core spec.

[QUOTE=Alfonse Reinheart;39942]OK, but that’s not what transform feedback does, so I don’t see how that matters. As I’m sure you’re aware, transform feedback is about marking outputs from a GS to be recorded into a buffer(s). This can be very easily done from a geometry shader.

It has nothing to do with removing any “Elements” from a buffer.[/QUOTE]

It does obviously not remove but copy arbitrary data in order into another buffer consecutively.

[QUOTE=Alfonse Reinheart;39942]The source of my knowledge is thinking about the tools Vulkan provides.

OK, every time you call EmitVertex in a GS that’s using transform feedback, the implementation takes the data in your output variables and writes them to the various buffers bound for feedback. That’s something that you can do yourself by writing to an SSBO. The only issue is figuring out where in the SSBO’s array to write to. This will be based on your particular invocation, relative to other GS invocations in the rendering command.

That’s where PrimitiveID comes in. If your GS writes 3 vertices in each GS invocation, then you know that the invocation for PrimitiveID 0 will write to locations 0, 1, and 2 in the SSBO array. PrimitiveID 3’s invocation will write to locations 9, 10, and 11. And so forth.

If you’re instancing your GS, then you need to multiply PrimitiveID with InvocationID.

Now, if you’re conditionally writing with your GS, then we need to talk a bit more about what data you’re conditionally writing. If the data you’re writing is unordered (that is, it doesn’t really matter what order you write i in, so long as it’s tightly packed. This is common for frustum culling), then you can use an atomic counter to compute the index. Each invocation increments the atomic counter if it is going to write data.

Of course, if you don’t need order, then you ought to be using a compute shader. Don’t pretend frustum culling is a rendering operation.[/QUOTE]

Nice, that will work, thanks! However, I do assume that desktop GPUs have more optimal hardware support for the general concept of append/consumeBuffers, other then emulating it with Atomics in the driver, no?

[QUOTE=Alfonse Reinheart;39942]It’s the usual reason why ARB extensions aren’t instantly part of core OpenGL. It’s why ARB_sparse_texture and ARB_bindless_texture never made it to core: because not all 4.x hardware can support it.

Shader draw parameters requires that an implementation be able to provide 3 pieces of data to the VS:

The base instance value. Since Vulkan uses InstanceIndex instead of InstanceID, this is unnecessary. The InstanceIndex has the base instance added to it already, so the shader doesn’t need a separate value for the base instance.
The base vertex index. Again, since VertexIndex already includes the base vertex index, there’s no need for a separate value.
A counter that increments for each drawing command. That one requires hardware support, since it must also work via vkCmdMultiDrawIndirect.

So Vulkan already has 2/3rds of the data. And the last 1/3rd requires direct hardware support. Admittedly, they could have added that as an optional feature, but at some point, you’ve got to ship the thing out the door. DrawID alone isn’t all that important.

Especially since Vulkan only permits multidraw operations with indirect rendering commands. So if you need the equivalent of DrawID, you can always use the base instance if the hardware allows it.[/QUOTE]

I was not aware that it is not trivial to add a DrawID as build in shader if multiDrawIndirect is available. For me its a really nice to have feature, but I think it will be one of the early extensions to come, at least for desktop GPUs.

Alfonse_Reinheart · March 10, 2016, 6:43pm

However, I do assume that desktop GPUs have more optimal hardware support for the general concept of append/consumeBuffers, other then emulating it with Atomics in the driver, no?

No.

Hardware has been progressing away from special features, not towards them. However useful the “general concept of append/consumeBuffers” is, image load/store and atomic counters are more useful. They are able to do everything that special-case append/consume buffers can, but they can do so much more.

Just think in terms of append/consume operations. Because the atomic counter can go anywhere, you can bind the instance count of a draw indirect command. This would allow you to determine how much to draw purely on the GPU, with no need for a CPU read operation (obviously, you need a sync before issuing the draw).

You can’t do that with transform feedback. Oh sure, you can use glDrawTransformFeedback to allow the feedback operation’s size to determine how many primitives to render. But you can’t use it to determine how many instances to render, if you were doing frustum culling of instanced objects. Or to use multiple atomic counters to increment different sets of instance counts, if you were doing frustum culling of LOD based instanced objects.