glslang - some odd decisions ?

From the OpenGL Shader Language doc:

"Actually, some of the comments in the gslang doc makes me FUME.

“When writing general programs, programmers have long given up worrying if it is more efficient to do a calculation in bytes, shorts or longs and we do not want shader writers to believe they have to concern themselves similarly.”

Now I just don’t buy that. I believe, as a programmer, I should ALWAYS have the option of chosing the data type best suited to my purpose. And if a fixed or half data type gives me enough precision then I should be able use them and expect performance gains over floats (as NV30 gives me)

Discuss

Maybe the next-gen NVidia hardware will use only full-precision FP data? And no other vendor plans to implement half precision floats and fixed data types. Then making this part of the core spec does not make much sence - this functionality should be exposed as an extention.

I agree with Pocketmoon.

The doc says that “programmers have long given up worrying if it is more efficient to do a calculation in bytes, shorts or longs”, but that’s not true. We are all aware that float computations are, in general, faster than doubles. That’s why everybody doesn’t use doubles in all their code. Certainly, SSE/3DNow don’t support doubles at all.

Considering that programmable graphics hardware is hardly as mature as CPU’s, and isn’t likely to become as mature as a modern CPU for some time (if ever), it’s hardly unreasonable to allow the user to request specific data formats that may, or may not, be faster. It imposes no limitation on the implementer, as they can simply ignore the hint and give them 32-bit floats.

However, at some point, internal shader hardware may have 64-bit floats that take much longer to make computations on. It is important to be able to support this choice, if the user needs to make more accurate computations.

I honestly don’t see much of an advantage to perform computations at a lower precision. The GeForceFx with it’s 16-bit floats and a paralell integer pipe requires you to split things up into integer, half and full float ops to get good performance but I don’t think that is something to strive for, especially in a forward looking graphics API. CPUs don’t actually compute stuff at different precision AFAIK, floating point values that stay in registers have 80 bits of precision. Computations on doubles take roughly the same time (maybe not complex scalar ops like exp and friends, depends on implementation) but they cost twice the memory and twice the cache bloat for the same data. However, storing stuff with lower precision is entirely different and useful to reduce memory requirements and bus usage. A lot of stuff doesn’t need full 32 bit floats for storing, the OpenEXR HDR image format only uses 16-bits for example.

CPUs don’t actually compute stuff at different precision AFAIK, floating point values that stay in registers have 80 bits of precision.

Correction: x86 CPU’s use 80-bits of precision. The average RISC CPU does not. Indeed, this is precisely why that provision exists. You cannot be sure that someone won’t want to make some kind of optimization based on the size of the data, or expose a higher-precision (but slower) format.

Allowing this as a possibility for providing optimizations is a good, forward-looking idea. If they, at some point, 64-bit doubles (or just higher precision than single-floats) are exposed by an implementation, the language is ready to adapt to a higher-precision, but slower, computation.

This is a little late, and just a nit-pick, but SSE only works with floats because of their size, not the speed of working with them. The point of SSE is that you can put four floats in the 128-bit registers and perform operations on them simultaneously. If you only use two doubles then the slight gain you get from that is ruined by the cost of packing/unpacking the doubles in the register.

Well I’m not totally positive on that but I know that’s how MMX works (it’s 64-bit registers best suited for holding 4 16-bit or 8 8-bit values).

I really think not supporting 16 bit floats needs to be re-thought. This can certainly make a decision for myself on whether to use CG rather than opengl shading language.

If the image is stored in 16bit floats (OpenEXR that ILM uses for their format for film), its crazy to have to bump that to 32 bits and double the memory transfer. Whether 16 bit floats is automatically bumped to 32 bits within the hardware, thats great but I should be able to read/write in 16 bit float.

There is absolutely NO WAY that I could do 24 fps 2048x1556 playback of 32 bit floats. The memory transfer to the card is just not going to happen in the next few years of gfx card generation.

If the next generation of GPU is going after the “Cinematic” look, then, thats 16 bit float is part of that.

There is absolutely NO WAY that I could do 24 fps 2048x1556 playback of 32 bit floats.

And why would you even want to try? 2048x1556 is probably going to kill the card’s bandwidth even if it’s 16-bpp, let alone the 64bpp that 16-bit floats cause. That resolution is truly ludicruous for anything to be released in the next 3 years. Modern games just now are able to run at 1600x1200 with a reasonable framerate (and 24fps isn’t reasonable). That is, of course, assuming you’re doing 3D rendering rather than movie playing (which, given the 24fps, is what it sounds like you’re doing).

I don’t believe that glslang design decisions should solely be based on PC games solutions. 3D simulations on mid-range sgi machines run at much higher resolutions than those found on $400 pc graphics cards. OpenGL is used by many 2d image real-time solutions and those packages already run at 2k. SAN solutions are capable of delivering the 2k images at faster than real-time. Real-time 2k images playback at 24fps, 12 bit color has been available for years.

If all you want to do is draw images, what kind of bandwidth problems are you expecting?

2048 * 1556 * 16 (bytes per pixel) = 203948032 bytes, or 194.5MB per frame.

194.5MB per frame * 24fps = 4,668 MB per second, or 4.56 GB/sec.

The video memory bandwidth of consumer high-end video cards today is in excess of 15GB/sec. What’s your problem?

If you’re actually wanting to do realtime rendering, rather than just video or something, you’re not going to get 24fps out of high-quality rendering because of the sheer volume of texture space you’re going to need. There just isn’t that much fast memory in even high-end SGI boxes to do this well. They may have GB’s of space, but the memory is usually accessed through a fast cache. The memory itself is not terribly fast. Certainly not fast enough to be transfering GB’s of textures in a frame.

Fundamentally, what does the texture’s floating point resolution have to do with glslang? Currently, both the nVidia and ATi float texture extensions provide 16-bit floating point formats. Why do you believe that GL2 will not continue the trend?