Further, on the subject of programmability in GPUs, I think that its obviously that someday the benefits of having a fully programmable general-purpose GPU will be clear, and the hardware manufacturers will start producing them. I think the cost in speed this would incur has not been outweighed by the benefits in flexibility. I assume that someday we will just see something like the removal of the limitation on number of operations in vertex programs and pixel shaders.
In the long run, though, it seems to me that once we have fully programmable GPUs, limiting them to just graphics will seem ludicrous. What if you could encode your physics algorithms as a “vertex program” and allow the card to process your physics for you? If we ever see floating-point framebuffers and a fully generalized pipeline, then I think that GPUs will become applicable to far more diverse problems than just graphics. At that point, though, it makes sense to move towards a new computer architecture in which a scalar CPU and a massively SIMD GPU work together as multiprocessors, with shared and exclusive memory for each. Programs would consist of machine code for both instruction sets designed to work in tandem. Or maybe this kind of massively parralel computation will become part of standard CPUs so that we will move back to the days when the CPU was responsible for all graphics work.
My logic for this is that as GPUs become more flexible we will want to apply them to more problems, and the CPU<=>GPU bus will continue to be the limitng factor, until the only suitable solution is to bring the two closer together…