VK_FORMAT_R8G8B8A8_[U/S]SCALED vs VK_FORMAT_A8B8G8R8_[U/S]SCALED_PACK32 ?

Karol_Gasinski · February 26, 2016, 7:04pm

What’s the difference between the two? While first one describes components as separate bytes, second one describes channels as 8bit portions of uint32. Does it mean first format is endiannes independent while second one is compatible with first one only on Little Endian machines? What is the purpose of keeping both options? Are there any (even legacy) use cases?

Karol_Gasinski · February 26, 2016, 8:19pm

While above formats look like legacy Vertex Fetch support ( btw. why still support fixed point quantization in Vulkan ??? ) I have the same question for this Pixel Formats :

(Endiannes Independent)

  VK_FORMAT_R8G8B8A8_UNORM          
  VK_FORMAT_R8G8B8A8_SRGB 
  VK_FORMAT_R8G8B8A8_SNORM
  VK_FORMAT_R8G8B8A8_UINT 
  VK_FORMAT_R8G8B8A8_SINT

vs

(Endiannes Dependent)

  VK_FORMAT_A8B8G8R8_UNORM_PACK32 
  VK_FORMAT_A8B8G8R8_SRGB_PACK32
  VK_FORMAT_A8B8G8R8_SNORM_PACK32
  VK_FORMAT_A8B8G8R8_UINT_PACK32 
  VK_FORMAT_A8B8G8R8_SINT_PACK32

Also, are other PACK16 / PACK32 formats Endiannes Dependent ?

Alfonse_Reinheart · February 26, 2016, 8:43pm

The “PACK” formats mean that each texel is stored in a value of the given size. So yes, they are effectively endian-dependent.

What is the purpose of keeping both options?

A better question would be why discard an option, if the hardware can support it? Are we running out of enums or something?

why still support fixed point quantization in Vulkan

I will assume that, by “fixed point quantization” you mean “normalized” formats, the reason is simple:

Because memory didn’t suddenly become free just because Vulkan exists.

Yes, there are many places where small format floats, shared-exponent floats, or just full-on 16 and 32-bit floating point images are used. But there are plenty of times when 8 bits per channel normalized, is entirely sufficient.

Why throw away a perfectly functional and useful hardware feature that saves space and performance for many cases? If there is specialized hardware for accessing and writing to them (and there is), why not make it available?

Karol_Gasinski · February 27, 2016, 8:44pm

A better question would be why discard an option, if the hardware can support it? Are we running out of enums or something?

Discarding such legacy features from API forces developers to use preffered formats introduced later.
As a result it gives a chance to remove their dedicated HW support from chips in few years (and eventually support few apps still using them with SW WA). Especially at the moment when new API is introduced.
Saved gates/die area can be then used for new HW features.
(there is never enough gates and die space)

By “fixed point quantization” I meant U/S SCALED formats where unsigned/signed int is cast to float value.
Such values don’t have even fractional part which had old OpenGL ES 1.x 16.16 fixed point formats.
Instead of using them we could use U/SINT formats and just cast them in the shader to floats ourselves if we really need to do that.

Alfonse_Reinheart · February 27, 2016, 11:04pm

OK so… which one is “preffered[sic]”? Which of the two should we have?

Also, who decides what is a “legacy feature”?

[QUOTE=Karol Gasinski;39867]As a result it gives a chance to remove their dedicated HW support from chips in few years (and eventually support few apps still using them with SW WA). Especially at the moment when new API is introduced.
Saved gates/die area can be then used for new HW features.
(there is never enough gates and die space)[/quote]

Really? We’re talking about endian conversion logic here. Do you seriously believe that this would be anything more than a rounding error in the GPU’s overall transistor count? Even for mobile hardware?

Also, I fail to see how being able to work in different endians counts as a “legacy feature”.

[QUOTE=Karol Gasinski;39867]By “fixed point quantization” I meant U/S SCALED formats where unsigned/signed int is cast to float value.
Such values don’t have even fractional part which had old OpenGL ES 1.x 16.16 fixed point formats.
Instead of using them we could use U/SINT formats and just cast them in the shader to floats ourselves if we really need to do that.[/QUOTE]

By that logic, you should just ignore the input format altogether and fetch whatever data you want from buffers directly, based on InstanceIndex and VertexIndex.

Having vertex formats that are separate from your shader allows you to do things like decide how to compress your data separately from your vertex shader’s use of that data. For one mesh, you might use normalized integers, and for another, you might use 8-bit scaled integers. Why should your vertex shader have to know or care how your vertex data is compressed?

Now granted, Vulkan bakes the input format directly into the pipeline. And, thanks to AMD’s lack-of-dedicated-vertex-hardware, it isn’t even part of the pipeline’s dynamic state, so you’re not allowed to vary it without making a whole new pipeline. But having dedicated vertex fetch hardware is not “legacy”; it’s more “not sucking.”

And in any case, why does it matter so much? Not one of the SCALED formats is required to be implemented by Vulkan implementations.

Karol_Gasinski · February 28, 2016, 10:58am

By that logic, you should just ignore the input format altogether and fetch whatever data you want from buffers directly, based on InstanceIndex and VertexIndex.

That’s “Programmable Vertex Fetch” way of doing things. It’s only drawback is that Vertex Shader execution will stall immediately at it’s beginning when we fetch first data.
That’s why we need fixed function Vertex Fetch unit to pre-fetch data up front (more than to introduce abstraction layer between data and shader).
Probably reasoning was that while it’s doing prefetch it can also decompress the data from whatever quantization it is in.

Mentioned lack of fixed function Vertex Fetch unit in AMD HW leads to conclusion that they do all that formats conversions at the beginning of vertex shader the same way app developer would do that. That’s why it cannot be part of dynamic pipeline state because it would introduce shader recompile on rebind of vertex buffer with different attributes format.

Having vertex formats that are separate from your shader allows you to do things like decide how to compress your data separately from your vertex shader’s use of that data.

That’s true for formats that differ only in precision and can be used alternately with only small quality loss.

For one mesh, you might use normalized integers, and for another, you might use 8-bit scaled integers. Why should your vertex shader have to know or care how your vertex data is compressed?

I don’t understand how I could describe mesh with normalized integers which represent [-1.0 … 1.0] or [0.0 … 1.0] quantization, and another one with scaled 8bit ones which don’t have fractional part and represent [0.0f , 255.0f] range, and at the same time use the same Vertex Shader for both of them. In first case I would want specialised Vertex Shader to benefit from fractional precision of normalized range, while in other case I completly don’t care about fractional preecision (probably in case of some really old DOS or very specific CAD application aligning everything to the grid?).

And in any case, why does it matter so much?

I’m asking this questions from curiosity to fully understand reasons for specific Formats to exist.
I understand that they are legacy of OpenGL, so maybe digging up some old OpenGL extensions would help here.

Alfonse_Reinheart · February 28, 2016, 12:33pm

[QUOTE=Karol Gasinski;39874]That’s “Programmable Vertex Fetch” way of doing things. It’s only drawback is that Vertex Shader execution will stall immediately at it’s beginning when we fetch first data.
That’s why we need fixed function Vertex Fetch unit to pre-fetch data up front (more than to introduce abstraction layer between data and shader).
Probably reasoning was that while it’s doing prefetch it can also decompress the data from whatever quantization it is in.[/quote]

AMD doesn’t agree on the need for “fixed function Vertex Fetch unit”, since all of their GCN-based hardware lacks such a thing. They emulate OpenGL/D3D/Vulkan’s vertex fetching by patching your vertex shader.

That’s why changing vertex formats is expensive, particularly on AMD hardware.

Only for AMD hardware. And even the “shader recompile” is not really a full “recompile”. It merely needs to patch the shader a bit.

Most vertex shaders are pretty simple. They take a position, a normal, maybe some texture coordinates, maybe a full NBT matrix. But that’s about it. The most exotic you get is a VS that does bone skinning. But outside of that, it’s simple.

What do they do with the position? Again, outside of skinning, the VS will generally multiply the position by a matrix or two. So why does the vertex shader need to know or care if the position range is [-1, 1] or [-127, 127]? If you render with a different data range, what has to change?

What has to change is the matrix used to transform those positions. If you compressed your data to [-1, 1] range, then your matrix should include a “decompression” transform. This transform is of course factored into the overall transformation matrix from model space to camera space. If the range is [-127, 127], you simply use a different matrix.

The actual shader logic does not change; it’s still just a vec4 position multiplied by a matrix or two.

Saying that these formats are a “legacy of OpenGL” is like saying that samplers are a “legacy of OpenGL”. They are only a “legacy of OpenGL” in the sense that OpenGL did indeed have support for them. But OpenGL provided that support because the hardware could do it and users wanted access to that hardware.

Karol_Gasinski · February 28, 2016, 1:21pm

AMD doesn’t agree on the need for “fixed function Vertex Fetch unit”, since all of their GCN-based hardware lacks such a thing. They emulate OpenGL/D3D/Vulkan’s vertex fetching by patching your vertex shader.

I though I just wrote that in previous post.

That’s why changing vertex formats is expensive, particularly on AMD hardware.

It’s expensive in OpenGL as app don’t know when it can cause patching, in Vulkan we need to change whole pipeline anyway aren’t we? (so the cost is incorporated into pipeline switch)

And even the “shader recompile” is not really a full “recompile”. It merely needs to patch the shader a bit.

It’s not so simple as you think. Shaders patching can be done on different levels of shader abstraction (from patching shader binary to patching intermediate representation), and may require partial recompilation in the driver.
It’s vedor/driver implementation dependent. Shaders patching and all sadness of OpenGL non orthogonal states was what everybody wanted to avoid in Vulkan and DX12.

Most vertex shaders are pretty simple. They take a position, a normal, maybe some texture coordinates, maybe a full NBT matrix. But that’s about it. The most exotic you get is a VS that does bone skinning. But outside of that, it’s simple.

I don’t think we need to explain here the basics.

So why does the vertex shader need to know or care if the position range is [-1, 1] or [-127, 127]? If you render with a different data range, what has to change?

My point was that normalized formats aren’t provided to pass position but texture coordinates, weights etc. Vertex Shaders don’t even need to consume attributes storing position at all.
Let’s skip discussion about passing position in such formats as its not leading anywhere.

Saying that these formats are a “legacy of OpenGL” is like saying that samplers are a “legacy of OpenGL”. They are only a “legacy of OpenGL” in the sense that OpenGL did indeed have support for them. But OpenGL provided that support because the hardware could do it and users wanted access to that hardware.

I think you’re too dramatic here. Can you provide real-life/production use cases for SCALED formats?
That’s what I’m interested in from the beginning.