I just read the paper avalaible at http://graphics.stanford.edu/papers/rtongfx/rtongfx.pdf . These guys have done a great job!
Anyway, here is a quote from the 9th page, which might lead to interesting ideas…
Once the basic feasibility of ray tracing on a GPU has been demonstrated, it is interesting to consider modifications to the GPU that support ray tracing more efficiently.
Many possibilities immediately suggest themselves. Since rays are streamed through the system, it would be more efficient to store them in a stream buffer than a texture map.
This would eliminate the need for a stencil buffer to control conditional execution. Stream buffers are quite similar to F-buffers which have other uses in multipass rendering (Mark and Proudfoot 2001).
Our current implementation of the grid
traversal code does not map well to the vertex program instruction set, and is thus quite inefficient.
Since grid traversal is so similar to
rasterization, it might be possible to modify the rasterizer to walk through the grid.
Finally, the vertex program instruction set could be optimized so that ray-triangle intersection could be performed in fewer instructions.