using program binaries for shader cache

I will share some interesting experience

Program binaries are great for shader cache to reduce compile time.
The effect is considerable both on ati and nvidia.
On ati the gain is greater as their GLSL compiler seems to be slower and avoiding it has quite dramatic effect.

There is one catch tho. Both on ati and nvidia the binaries are VERY big. I noticed that when in some occasion my shader cache reached > 100MB. I keep it all in the ram to avoid slow disk operations (which would somewhat diminish the gain from avoiding the compiler).
Then I tried to compress them and they turned out to be HIGHLY compressible. I tried the very simple and weak (and fast) compression algorithm from RFC-1978.
It is literally 10 lines of code for the compression function and 10 lined for the decompression. Even that manages to squeeze the binaries more than 1:3. More serious compressors easily reach more than 1:10. This is both on ati and nvidia.
When i examine the binaries with hex-editor, they consist mostly of vast empty or semi-empty areas and some very repetitive (and hence very compressible) textual data.
I don’t have idea why both nvidia and ati give such bloated binaries.

The moral is that binaries are indeed very useful to save shader compile time (to avoid game stalls etc.), but be sure to use some compression or otherwise they will soon overwhelm your memory.
Even weak and fast compression is highly-effective and even slow and good compression is far faster than invoking the compiler.

my shader cache reached > 100MB

For how many shaders?

maybe 3000 binaries or so

That’s only 34K on average. That’s not so bad. Obviously, there’s room for improvement, but it looks like 32K of data (default uniform state) attached to a 2K block of shader data. It looks like the data is optimized for being loaded quickly with a fast DMA, rather than for space.

Are all of the binaries the same size?

Also, have you tried combining this with separable programs? What do you get then?

actually the facts are more grave than what i wrote,
the numbers i gave were by memory but now i checked the real data of some case and they are: 957 program objects with total uncompressed binary size of 272MB. that makes ~300k per binary!
This is on ATI. On nvidia the binaries are somewhat more compact, but are still very large. I cant check for nvidia right now.

I doubt the binaries are optimized for direct upload to the hardware because they are full of textual data (looks like variable names, GLSL types and such).
The nvidia binaries actually contain the shaders in assembly form, looks like the assembly from the vertex_program/fragment_program extensions. It isn’t even with native hardware instructions which are scalar and this assembly is vector.

that worries me a bit because it means the “binaries” on nvidia will need compilation/conversion before they can be used by hardware and that may diminish the benefits of working with binaries.

on ati the binaries seem to contain real native machine code. it is only a tiny fragment of the total binary.

the sizes vary but all are very big. from the mentioned 957 binaries the smallest is 186730 bytes and the biggest is 342334 bytes. strangely the binaries contain the names of dead uniforms (that are declared but never used in the shader). those shaders all contain one very big uniform block with many uniforms inside while only very few of them are used in any given shader. yet all binaries apparently contain info about all the uniforms and maybe that is one source of bloat. but i dont think this alone can explain 300k per binary

yes i have tried with separable shaders and they do improve the situation, but unfortunately they have very serious bug on ati which renders them unusable. so until they fix the bug i am stuck with the old way.
the bug in question is this