Can early KIL (ARB FP) improve fillrate?

Interesting information, the only disapointment is the naive assumption by some that this is somehow an oversight that someone is to blame for. SIMD parallelism just doesn’t accomodate this sort of thing, if that’s the hardware. It takes radically different and more complex hardware implementation to exploit these opportunities for performance gains, and even then other issues may still limit what you can do.

I imagine that, once you get hardware that can do looping in fragment programs, you’ll see a performance benifit from early-outs.