Can gl_FragColor be read?

Zengar · December 5, 2006, 1:48pm

Korval, I still can see that this actually works on current hardware without any synchronozation problems. From the FBO spec is is not clear, that they even suggested a positive answer (like, yes we could allow it). Why wouldn’t they, if the hardware actually supports it? They could at least introduce it as an extra extension.

IMHO, it is difficult to controll such behaviour. It seems that reading from a texel that is going to be written to is ok, but what is if one would read a texel that was written to in the last shader clock (a texel from a different quad, for example). This is where the real issues come in, synchronisation-wise.

I think, they should introduce the gl_FramebufferColor (or how is is supposed to be called) at last, as a variable that holds the current framebuffer value.

Well, they will have their own reasons…

P.S. As I know, FBO’s are not being accelerated by SLI, it only applies to rendering to window. This is why they made the GPU_affinity extension in the first place (which sence I don’t really get).

ZbuffeR · December 5, 2006, 1:52pm

Raystonn, what you mention looks like the (future) “Sample Shader” in this document :
http://www.gamedev.net/columns/events/gdc2006/article.asp?id=233

Raystonn · December 5, 2006, 1:54pm

ZbuffeR, that sounds sweet. It will be a great relief to be able to completely decouple the CPU from the GPU on branch-heavy shader code.

-Raystonn

Komat · December 5, 2006, 1:57pm

Originally posted by Raystonn:
For multipass with early-z branching, a pass is implicitly defined as anything occurring between a glBeginQuery(GL_SAMPLES_PASSED, …) and the accompanying glEndQuery(GL_SAMPLES_PASSED).

Actually that is only what you consider a pass. Other people use the queries for different things and do not wish to relate that with cache flushes.
The proper way to mark something like pass would be to add additional function to do so.

Raystonn · December 5, 2006, 1:57pm

ZbuffeR, I’m a bit skeptical on the details, though. If they continue to formally disallow any and all mixtures of reading and writing the same pixel, then we will need to swap the read and write textures each pass. Without explicit support for this being done for you, I don’t see the sample shader being very useful.

-Raystonn

Raystonn · December 5, 2006, 2:05pm

Komat, that is certainly a possibility. If we use two buffers and swap them (the current official way to do this), then caches are flushed every time they are swapped. If we use a single texture and read the current value prior to writing the new one, then we need a way to tell the GL to flush the cache so it can be guaranteed to be read. A new function is one way to do this.

Either way, when you multipass ping-pong you need the cache flushed for the targets you wrote to during the last pass.

-Raystonn

Komat · December 5, 2006, 2:06pm

Originally posted by Zengar:

I think, they should introduce the gl_FramebufferColor (or how is is supposed to be called) at last, as a variable that holds the current framebuffer value.

That was considered in GLSL specification 1.10 in two issues (#7 and #23). Was rejected because of speed concerns.

Raystonn · December 5, 2006, 2:07pm

If such functions were to be introduced, they should target individual write targets. This would allow you to flush a single texture to memory without overly impacting performance by forcing a flush on every write target. You ideally want to flush only the textures that both a) were written to last pass, and b) will be read during the next pass.

-Raystonn

Zengar · December 5, 2006, 2:37pm

@Komat: yup, I know, but it is still somehow sad, seeing as it is supported by current cards (at least partially

Korval · December 5, 2006, 3:51pm

For multipass with early-z branching, a pass is implicitly defined as anything occurring between a glBeginQuery(GL_SAMPLES_PASSED, …) and the accompanying glEndQuery(GL_SAMPLES_PASSED).
Wait, let me get this straight.

Now you want to bind FBO behavior to that of occlusion queries? How about they just put out an “EXT_write_Raystonns_code” extension while they’re at it?

No, this is way to special-case for it to be a reasonable request.

I don’t see how SLI would affect this.
cough

“The extension specification was written by people who know a lot more about how their hardware works than you do.”

The fact that you don’t see it is completely irrelevant to whether or not it is there. They’re saying that it is there, and that’s what matters.

We are basically forced to wait for the pass to finish rendering before beginning the next, as we need to know when the number of samples written reaches 0.
Hey, you choose the algorithm, not me and not the FBO authors. You should have made a better choice. And if no better choices make themselves available, then you bite down and accept what you’ve got.

Korval, I still can see that this actually works on current hardware without any synchronozation problems.
Yes, and I’m sure that there’s other unspecified behavior that “just works” too. Feel free to use it, but don’t complain about the spec if in a later driver revision it ceases to work for you.

@Komat: yup, I know, but it is still somehow sad, seeing as it is supported by current cards (at least partially
Maybe. Kinda. In a limited number of tested cases. And with completely unspecified restrictions.

Hardly the foundation for a good extension.

Humus · December 8, 2006, 8:27pm

Originally posted by Zengar:
From words of Humus I also understand that it also works for Ati.
Not sure if I ever said it did. I actually don’t know if it does. It may work, or work in some cases, or may not work. I really don’t know. I’m not familiar with all the hardware details involved.

Humus · December 8, 2006, 8:36pm

Originally posted by Zengar:
I think, they should introduce the gl_FramebufferColor (or how is is supposed to be called) at last, as a variable that holds the current framebuffer value.
As much as I love the idea as a software guy there are hardware reasons why this would be a problem in practice. The hardware really likes having a straight top to bottom pipeline. Data enter at top, gets processed and is written out at the bottom, with no loopbacks. As soon as you loop data back you’re introducing a helluva lot of coherency checks for the hardware to perform which would be a problem for parallelism plus makes communication paths necceary between units that otherwise don’t have to communicate. If you’re rendering to and texturing from the same surface within the same draw call you’d need communication between the backend and the texture units.

system · October 19, 2021, 7:40pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.