reading from the framebuffer

Sadly one of the most important thing in the shading language have been removed - the possibility of reading the framebuffer.

As stated in Issue 7 & Issue 23 it has been decided against because “There is too much concern about impact to performance and
impracticallity of implementation.”

Hm, so the problem is that it is too hard to do. So again its left to us users to come up with a hack like:

write scene to a pixelbuffer/texture
map texture pixelwise the polygons

How different is the framebuffer from pixelbuffers? How hard can this be?

For me this is crucial functionality and Issue 8 states “Overall, the decision here is to set a goal for hardware to strive towards for the next few years.”

[This message has been edited by phook (edited 02-27-2003).]

Yes, I agree. Frankly I think that this restriction makes much of what has been put into consumer-level hardware in recent months (read: floating-point buffers) almost irrelevant.

nutball wrote:
I think that this restriction makes [floating-point buffers] almost irrelevant.

Do you have an specific algorithm in mind? Perhaps if the ARB gets examples of what cannot be done due to this change, they’ll reconsider. The specification does include the posibility to query if the shader is a “slow” one or not, so I really don’t see the problem. If the hardware can’t read the framebuffer, the compiler has to be aware of that and generate a shader that will run on the CPU. The programmer can query if that’s the case and switch algorithms accordingly.

For me it’s not much of an issue, because I can live with just a programmable blending unit (something akin to the separate blending factors extension). Many algorithms can be changed (with the side effect of better performance) if one switches from “write to the fb and stop if this value is in it” to “if I’m about to write this value to the fb, don’t”. There are algorithms which need to read the framebuffer before doing something useful, but there ways to implement even those: copy framebuffer to texture basically, a multiple buffer extension can help here, too (e.g. a la ATI’s).

It’s also noteworthy that every gl_FB* varible has been removed, not only gl_FBColor. That means, there’s no gl_FBDepth (I guess you can live with that), but there’s no gl_FBStencil (ouch) and there’s no gl_FBDatan (double ouch, since auxiliary buffers are suggested as an alternative for the lack of gl_FBColor).

Looking at issue 7 (is alpha blending programmable), there are enough reasons as to why alpha blending can’t be made programmable in the fragment shader. But it doesn’t say if there’s some flexibility after the fragment shader. Further on, issue 43 talks about dPdx(gl_FB*). A couple of weeks ago I hard a really hard time explaining how such a thing could be implemented. At the end, I resorted to “/somehow/ you get the current values of the framebuffer at neighbouring positions – and please don’t ask what happens if multisample is enabled”, so I guess I agree with the argument presented. Issue 77 is just more of the same.

My guess here is that you’ll be able to have your fragment program run, and write to, say gl_FragDepth and gl_FragStencil, and then have the usual alpha, stencil, blend and logic operations.

Originally posted by nutball:
Yes, I agree. Frankly I think that this restriction makes much of what has been put into consumer-level hardware in recent months (read: floating-point buffers) almost irrelevant.

No, it is a good thing to remove features that can hurt performance of shaders. We are in real time, so it is always a trade-off between speed and flexibility.
Then, i agree that we will need a FIXED blending stage adapted to Floating Point frame buffer.

Originally posted by phook:
Sadly one of the most important thing in the shading language have been removed - the possibility of reading the framebuffer.

Sorry. This was a tough one. As noted in the issues, many wanted it and wanted it fast. (Disclosure - I’ve personally argued against FB reads for a long time.)

The answer has been consistent - you can’t have both. The surprise in December was therefore really relatively minor (but still surprising).

So, can we try to separate things in a few different ways?

  1. If you need FB reads because you can’t do traditional fixed function blending to float buffers. IF you could still do fixed function blending to float buffers, are you still interested in FB reads?

  2. If you need FB reads because fixed function blending is too limited. IF you could generalize fixed function blending, are you still interested in FB reads?

  3. If you need FB reads of color, depth, stencil, WHICH depth and stencil do you want when multisampling? Can you explain (as specifically as you are comfortable with) what you are trying to do?

-mr. bill

[This message has been edited by mrbill (edited 02-27-2003).]

Originally posted by tayo:
No, it is a good thing to remove features that can hurt performance of shaders. We are in real time, so it is always a trade-off between speed and flexibility.

You may be “in real time”, but not everybody is. Please inform where it says that OpenGL is for real-time use only.

At least it says ‘high performance’ http://www.opengl.org/images/TinyPhrase.gif

mrbill,
#1 would be great. Accumulation is probably the most important one. If I may beg for more,
/dst.rgb=dst.alpha*src.color+dst.color
\dst.alpha=src.alpha
, that would truly be heaven.

Regarding #3,
strange question
If I got this right, multisampling revolves around the assumption that one color value is enough for n subpixel. So, IMO the sensible thing to do would be to give ‘the’ (ie central) Z/Stencil-Values if the shader uses it to modify color (ala depth range fog). If on the other hand, the shader only reads and modifies Z/stencil, technically that part of the shader should be executed seperately for each subpixel (though it may well just be a broken shader … not my call).

If the shader does both, the hardware should do both. Trick the color part into thinking that central z/stencil is it, and do mulitple runs of the z/stencil-modifying parts with individual values.

Would make the hardware a lot more complex.
Not that I would want or advocate that, but it seems right wrt multisampling semantics.

Hm, my post was a more like “SIGH no FB reads”, than “GRR - I want FB reads”.

I have read the arguments in the paper and sympathize with them, but I still feel that its a serious omission.

[b]

  1. If you need FB reads because you can’t do traditional fixed function blending to float buffers. IF you could still do fixed function blending to float buffers, are you still interested in FB reads?
    [/b]

Yes, I would - its not blending I am interested in - it is serious mulitpass.


2) If you need FB reads because fixed function blending is too limited. IF you could generalize fixed function blending, are you still interested in FB reads?

Still interested ;-). (but if I cant have it this would soothe it a bit)


3) If you need FB reads of color, depth, stencil, WHICH depth and stencil do you want when multisampling? Can you explain (as specifically as you are comfortable with) what you are trying to do?

well, I can see that multisampling pose a problem, since the values that would make sense (to me) would be the NOT multisampled ones (viewing multisample as a postprocess).

As to what I am trying to do, well, I am not
specifically trying to do anything - but a couple of the ideas I have could benefit from FB reads.

They can be implemented using the feedback mechanism of FB -> texture, or render to texture, although this would prevent the polygons to interact through the FB (unless you copy the FB between every polygon), not to mention the lack of elegance that FB reads would give.

I am perfectly aware that shaders using FB reads would take a performance hit, as they would introduce the spatial/temporal order dependency - but maybe the pipe could change into a mode allowing this? Certainly when FB reads are not used, the pipe should not be slowed down by this requirement.

Anyway I think the OpenGl Shading language hits home on all the other subjects than this and all that I am saying is: don’t let this go just because it is impractical to implement.

[This message has been edited by phook (edited 02-27-2003).]

[This message has been edited by phook (edited 02-27-2003).]

mrbill wrote:

  1. If you need FB reads because you can’t do traditional fixed function blending to float buffers. IF you could still do fixed function blending to float buffers, are you still interested in FB reads?

yes, as long as…

  1. If you need FB reads because fixed function blending is too limited. IF you could generalize fixed function blending, are you still interested in FB reads?

… this exists, too.

Basically what I’m interested in is premultiplied-alpha blending. At the end it’s a question of “DST.Color = SRC.Color + (1 - SRC.Alpha) * DST.Color” vs “DST.Color = SRC.Alpha * SRC.Color + (1 - SRC.Alpha) * DST.Color”. The difference is that one of the operations is associative and the other one is not. You have to notice that in the non-premultiplied case the equation for the alpha channel is different that the equation for the RGB channels. In the premultiplied case, it’s just one and you can express both using glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA). The problem is that you have to premultiply, which, using 8-bit channels, reduces your resolution in color space /visibly/ when compositing more than a couple of polygons. To work arround this, you could avoid premultipling and doing the multiplication in the fragment program, passing the appropiate values to the blend stage. It’s better than just premultiplying, but it still has artefacts (for whatever reason). My gut feeling tells me that having a fragment program working with floats and a matching blending stage would fix the problem, so I guess that makes point 2) moot.

At any rate, I’d still be interested in having something like the separate blending factors extension. I’d be happy just being able to specify four sources for the factors (2 for RGB, 2 for alpha), but I guess some folks would like to have eight (6 for RGB, 2 for alpha).

  1. If you need FB reads of color, depth, stencil, WHICH depth and stencil do you want when multisampling? Can you explain (as specifically as you are comfortable with) what you are trying to do?

Straw poll…

The one that’s visible in the framebuffer. It’s hard to tell without knowing which multisample pattern we are talking about here. It’s also hard to tell without knowing how multisample is implemented in hardware. I remember we had a discussion when NVIDIA came out with their QQ multisampling thing. It was about their claim about requiring less memory than other implementations but it still required more memory than no multisampling. Multisample operations are local, that means you don’t need to store the whole oversampled image, but instead you can have a locally oversampled region and reduce it to the on-screen pixels. You wait for all the data that will make up a pixel ito be available, and then you write it to the framebuffer. I guess I’m saying that yes, intuitively I see there’s a problem if multisampling is activated, but I can’t put my finger on the exact kind of problem (other than a more complex GPU design that is).

Originally posted by phook:
1)…
Yes, I would - its not blending I am interested in - it is serious mulitpass.

Wouldn’t an f-buffer solve the spatial/temporal order dependency issue in this case? It looks easy to implement and it has some other advantages - being able to store any intermediate results and not failing in some cases when transparency is involed (as when using the framebuffer).

See http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/fbuffer.pdf for more info

[This message has been edited by GeLeTo (edited 02-28-2003).]

Originally posted by GeLeTo:
[b] Wouldn’t an f-buffer solve the spatial/temporal order dependency issue in this case? It looks easy to implement and it has some other advantages - being able to store any intermediate results and not failing in some cases when transparency is involed (as when using the framebuffer).

See http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/fbuffer.pdf for more info

[This message has been edited by GeLeTo (edited 02-28-2003).][/b]

The problem with reading the framebuffer has to do with pipeline hazards. If you have two fragments “on top of each other” that both need to read the framebuffer, you need to fully shade the first one and apply it to the frame buffer before going to the next fragment. In some ways an fbuffer makes this problem worse because it make the pipeline effectively much longer.

Originally posted by cass:
The problem with reading the framebuffer has to do with pipeline hazards. If you have two fragments “on top of each other” that both need to read the framebuffer, you need to fully shade the first one and apply it to the frame buffer before going to the next fragment. In some ways an fbuffer makes this problem worse because it make the pipeline effectively much longer.

I was speaking about the multipass case (drawing the same object over and over with blending). The f-buffer should solve the problem because when the final pixel is written to the frame buffer no blending will be involved. All the blending is done in the f-buffer.

Originally posted by cass:
[b] The problem with reading the framebuffer has to do with pipeline hazards. If you have two fragments “on top of each other” that both need to read the framebuffer, you need to fully shade the first one and apply it to the frame buffer before going to the next fragment. In some ways an fbuffer makes this problem worse because it make the pipeline effectively much longer.

[/b]

But is that cost worth completely banning it?
Some algorithms are going to run so much faster it might be worth forcing the interlocks.

Quick Example:
Custom depth buffers, only really possible if you have read/modify/write. Depth peeling could then become fast (e.g. peel multiple layers per pass). Depth peeling still seems to be the best hope for handling transparancy well.

It just seem to harsh to disallow it.

OK, I’ve clamed down a bit now Yes, I’d be happy with post-fragment program blending in FP buffers. That would solve my immediate requirements. For the stuff I have an inkling I want to do in the future, programmable blending would be better.

A related question is: why isn’t the current value of a pixel in the framebuffer provided as an input to fragment shaders now? As Cass says, it has to do with data hazards. By providing an input of the framebuffer into the shader (effectively a framebuffer read), the GPU fragment pipeline is restricted because of ordering semantics. This is one reason why GPU’s have retained a dedicated blend unit.

I would suggest, however, that exposing a floating-point blend unit along with a programmable blending model (instead of the current fixed GL blend funcs) would be a positive step. I understand that this might significantly increase the amount of logic needed for the blend units, and place a burden on the memory system. However, the current workarounds for simulating floating-point blends are fairly intrusive and much less efficient.

Eric