Is unpacking bytes inside a shader worth it?

Liebranca · June 5, 2020, 5:02pm

Hello,

Reading through other threads that kind of touch on this, but I haven’t found a direct enough answer to my issue. Eh, not as much as an issue as simple curiosity.

So, I wrote a few chunks of code to store regular 32-bit floats in a 16-bit homemade format. Without going into more detail than necessary: I squeeze the numbers to lie in fixed places on a scale (1/64, 1/32, 3/64, 1/16… that sort of numeric progression). Unpacking only takes some bit shifting and multiplication by a constant.

Now, there’s no bottleneck I’m working around here; this was just an aesthetic choice. I like the way meshes deform funny when you enforce the loss of precision. So there’s really no problem in using this format for storage only and working with default floats when rendering. However…

It seems wasteful as I don’t need the extra precision. And I wonder if I stand to gain something from passing in vertex data without unpacking (as unsigned bytes) to the buffer i use for drawing, and unpacking the data within the vertex shader.

As in, could there be benefits or I’d just be shooting myself in the foot?

GClements · June 5, 2020, 7:46pm

If this is a linear representation, why not just use normalised values and fold the scale/offset into the transformation matrix?

Alfonse_Reinheart · June 5, 2020, 8:56pm

As is the case with any performance issue, what you gain depends entirely on where your bottlenecks are.

The advantages of compression are what they’ve always been: less space for vertex data and lower bandwidth for vertex data. If memory bandwidth is a performance issue, then you can help alleviate it by reducing the size of your vertex data.

Generally speaking, the performance of processing units has increased faster than memory bandwidth, so being able to shrink vertex data is often a win. However, it’s only worthwhile if that’s your actual bottleneck.

Liebranca · June 5, 2020, 10:10pm

Hmm. It’s not entirely linear, there’s a wee bit of hackery involved. Leftover bits have to be shifted around in some cases to add a zero or do some slight rounding. Maybe it can be done though…

The application I’m working on is fairly small. Current release build is a whooping ~800kbs c:

There’s really no call for optimization at the moment (on my end at least), I am just in sort of a theory quest. Exploring not-so-obvious techniques, helps me get a grasp on how things work.

Extra bandwidth could come in handy. It’s not so much a question of improving performance as it is making sure everything’s as lightweight as I can make it. Mostly toasters at my disposal at the moment so I kind of have to go a few extra miles just to be sure.

arekkusu · June 7, 2020, 3:07am

See paper exploring performance improvements from unpacking custom vertex formats in shader (15 years ago, no bitwise ops.)