I believe the problem (under windows) is this:
-
timeGetTime() is easy to call, but returns only milliseconds (which at 50 fps gives an error of up to 5%!) and takes a long time to execute.
-
QueryPerformanceCounter() returns microsecond-or-better resolution, but is slightly harder to call (because you need to divide by the resolution) and will occasionally step forward in time by > 4 seconds. It is faster than timeGetTime(), but still takes a good two microseconds on a P-III 1 GHz.
-
RDTSC is very fast (I measured 47 nanoseconds), and cycle accurate. However, it is hardest to call (you have to define a “naked” assembly function) and you have to find out the effective CPU speed from somewhere (system registry, or measuring it).
What I end up doing is using RDTSC, but using timeGetTime() to calibrate the RDTSC every so often. If I do this calibration every 100 frames, that 5% error has shrunk to 0.05%, which I can live with The draw-back is that it takes some time for the calibration to warm up, and I have to take a wild guess at CPU speed before then based on spinning for 10-20 milliseconds and counting cycles (which gives a good 10% error up front, worst case).