I’m willing to take a performance hit for it if that’s neccesary. It’s like with depth-writes from a shader, which causes a large performance reduction, and isn’t nearly as useful.
My concern is not that there will be a performance drop from using it. My concern is that the feature would require a restructuring of the entire back-end of the renderer, and that such restructuring would either prevent the use of performance-enhancing features (like having multiple quads in the pipe) or dramatically complicate the back-end logic, thus increasing the cost of the chip or costing us other, potentially useful, features.
do we really expect that to be in the next-generation implementations?
DX Next will come out with Longhorn in 2006. As such, the API is going to be something of an indicator of the expected functionality of cards of that era. Not of the cards of next year.
Personally, I’d rather see floating-point blends and programmable texture fetching/filtering. Both are useful and neither would break parallelism.
Programmable texture fetching sounds like it’d be really slow, but floating-point blending is clearly something that would be of great value in the (near) future.
No, it isn’t because no, you can’t. That’s what we’re talking about.
The point I was making is that if you “Move the color buffer read to an earlier stage, but not “the blending unit” as a whole.”, then it is the same as what we are discussing. It isn’t an alternative to moving the blending into the fragment shader; it’s the exact same thing, because if you could read the framebuffer from the shader, you’d never used fixed-function blending again.
Read access to fragmet.position.z isn’t free either.
Read access is free (or, at least, pretty cheap). Write access isn’t, since it screws up all the fast z-culling hardware.
and as an aside, read access to target contents is a lot more interesting than fragment.position.z
True. But read access to the framebuffer is much more difficult than simply giving the fragment program the computed z-depth.
I really wonder how often issue #2 crops up in reality. How bad is it, really? I honestly don’t know but I’d like to.
Well, it never happens on an ATi chip because the hardware isn’t designed to have multiple “quads” in-flight simultaneously. Apparently, this is not true for FX chips. I don’t imagine that it would come up too much, as you would have to have a pretty deep pipeline for it to happen, but the hardware designers would have to devote resources to preventing the problem in any case.
and you simply can’t do it with fixed function blending alone, unless, of course, you copy generous portions of your render target to a texture
Well, technically, you don’t have to “copy” it. With ATI_draw_buffers, you can write the color to the frame buffer and write the luminance to an AUX buffer. From there, assuming ARB_superbuffers, you just bind that buffer as a texture, and you can do regular blending as a post-process.