Look at the NV_fence extension, it has some relevance here, it’s an NV extension though. It turns your question on it’s head because it may seem to you like it tries to keep the CPU busy, but if you think about it, it’s the same question. You want to keep both busy and ballance the load.
In general though, you want to mix CPU with graphics, so it’s not exactly the case that you want to draw everything and then compute then swap.
Ideally you want to keep the graphics busy, while you compute but not waste time blocking of full graphics FIFO’s or starve graphics while your application runs. The former may be unlikely unless you have lot’s of data to draw, You’re thinking along the right lines but here’s some simple suggestions.
You can easily clear the screen then compute or poll input, then draw, I also like to try and block on swap to keep latency low before I call handle my input. Some don’t.
You could also try and interleave compute with draw, for example drawing a sky/background polygon then doing compute then drawing, but be warned on the coarse Z hardware like GeForce3 you may want to draw the backdrop last because it will be occluded and faster when drawn last.
Finally if you have a lot of compute you want to interleave it with graphics a lot, but you don’t want to starve graphics and this is what the NV_fence extension is for. Basically you send the pipe a token and can querry when the token is finished in graphics hardware without a glFinish or some other blocking command.
So pseudocode looks something like this:
// stuff some stuff down the pipe before
// first fence token
// set up fence and send stuff to
// keep GPU busy after fence token done
// we’ll only send more data when
// fence if finnished in hardware
// this loop keeps CPU
// busy while GPU is drawing
// This could be compute for next frame
// or maybe some stuff for this frame
// such as on the fly culling, the more
// compute the more culling you can do
// while you’re waiting which is in a
// nice ballance
// note that compute has to
// be done is bite sized chunks
// swapbuffers and perhaps block
// optional block to keep the input
// low latency