How is GL's multmatrix so fast?

Thanks for investigating it.
No, vsync is disabled.

I’ve noticed something else though (unrelated) - if I have a Sleep(1) in my render thread, I can get nothing more than 100 fps - without it I get > 1000 fps - why is this? Sleep(1) should only sleep for 1 millisecond, surely?
I’ve lowered the priority of my render thread and removed the sleep, and now I get sort of > 500 fps.

Thanks for the program Much more refined than mine was.

It’s also nice to see that my routine pulled ahead a little bit. On my 1.7 Ghz pIV Xeon at work, it was about 33% faster (though still slightly slower than the OGL one).

– Zeno

Hmmm yes Sleep(1) should only sleep for 1ms.

BUT using Sleep() also puts the thread at the end of the sheduler list of running threads.

For example Sleep(0) is usally used to give an differend thread the remaining CPU time reserved for the current thread.

So using Sleep(1) causes your thread to suspended for a longer time (check the documentation).

Hope that helps,

LG

Thx Zeno

I think the OpenGL version is faster because the glLoadMatrixf call makes some of the memory used by the following glMultMatrixf call level 1 cache local.

Regards,

LG

just a comment. Sleep(1) gives approx 16-18 milliseconds delay. System can not switch faster…

Another comment.

If you know how the matrix is made. eg. rotations , translation etc. you do not need all adds and muls…

My experience with Sleep() :
on my Win2K system
Sleep(n) with n = 1 to 10 --> effective sleep time is 10 ms
Sleep(n) with n = 11 to 20 --> effective sleep time is 20 ms
Sleep(n) with n = 21 to 30 --> effective sleep time is 30 ms

Note : program running with default priority.
Yes, the doc says that n should be the time the thread is suspended…

I think Knackered gets hís matricies from quaternations so using specialized multiplication code (which indeed can be alot faster) wont work too well.

As for the Sleep() the time to resume the thread depends on various factors and is hard to predict. So far I have seen almost everything between 10ms to 50ms (on a heavy loaded system).

Regards,

LG

Info… Standard code on my PIII 600 MHz gives 2.9 million 4x4 matrix mults where the matrixes have random numbers on each position -> full 4x4 is needed. Plain C++ , VC 6.0 , Release

If I do a local storage of tmp i get lower values probably because most time all my matrixes are already within the cache.