Blend shaders

A blend shader should work much like a fragment shader, running immediately after the former.
The blend shader’s input is the fragment shader’s output as well as the frame buffer’s pixel value. The blend shader’s output is the value that will be written into the frame buffer.
Not even the whole huge set of GLSL functions are necessary to make this useful. Merely providing min/max, abs, and ±*/ would be big win already.

Current hardware obviously can read from the frame buffer, perform several mathematical transformation, and write back to it. Otherwise, glBlendEquation[Separate] could not work.
Blend shaders would let you remove function calls with cryptic constants and limited functionalty with one line of easily readable shader code (or more lines, if you will), which does exactly what you need.

You could, among other things:

  • Use logluv encoding (or any other non-rgb colour representation) and do correct blending.
  • Determine an object’s thickness (for subsurface scattering etc) in one pass.
  • Run a verlet integrator without texture ping-pong.
  • Run your own accumulation buffer if you need one, and never worry about hardware acceleration.
  • Do shadow mapping with several semi-transparent occluders.

This has been suggested before.

Neither the current (DX10) nor the upcoming (DX11) hardware architectures support this functionality, so it cannot be implemented at this point.

It’s been suggested several times in fact, i know i have been talking about it since at least 2006.

Technically speaking something similar was suggested back in the old openGL 1.4 glsl extension, it wasn’t implemented because of performance concerns, but it shows that at least it’s possible to do.

the main problem i guess is in the ROP where blending is done, either you do it in there but that means the ROP has to be more advanced and it will be slower, which is bad for the ROP.
Or you do it in the fragment shader which would be great, but it does create some other problems specifically related to overdraw.

I am pretty sure they will crack it though, since it’s the next logical step in the pipeline.

One issue here is that the amount of ROPs has remained more or less constant since R400. Unless there is a change in the direction of hardware design, there simply isn’t enough performance to make this stage programmable.

We’ll see what the future holds in due time, I guess.

on the amd/ati platform yes, it’s currently at 16 i believe, nvidia on the other hand has up to 32 ROPs on the g200 series.

the thing is that sooner or later the importance of ROPs will be in question, will they be important or just in the way, maybe it’s better to integrate them with the shader processors

My hope is that with the advent of the g300 series this problem gets solved and that they might implement blend shaders in a driver update.

Still a feature I would love to see but I have to admit with the up-coming of OpenCL, I’m not sure anymore.

It could still significantly reduce the memory band wise on some hardware but … well.

There is not a single mention of ROP’s anywhere in the whitepaper for NVIDIA’s new Fermi architecture GPU’s.
Instead they seem to have added large cache’s and a memory system more like a CPU that is able to ensure that a read always gets the most recently written value, even if it has not been written to the DRAM yet.

This has obviously been done for GPGPU computing such as OpenCL, but if raster op’s are now being done in software then there is no reason why a blend shader extension could not be added for this next generation of GPU’s from NVIDIA.

Yea i agree, the only thing that was really missing to do this change was a way to better sort fragments to the correct processor so you don’t get overdraw, add to that the new L1/L2 memory structure and you suddenly don’t need ROPs at all.
Which in turn forces the fragment program to do the blending.

So all that is left is to expose it.