I am trying to measure the performance of an OpenGL application (also using Cg fragment/vertex shader elements). At the moment I am using the Pentium assembly ‘rdtsc’ instruction, which simply returns the cycle count, calling it around the rendering loop as follows:
start = rdtsc();
glTexCoord2f(0.0f, 0.0f); glVertex2f(-1.0f, -1.0f);
glTexCoord2f(float(xsize), 0.0f); glVertex2f(1.0f, -1.0f);
glTexCoord2f(float(xsize), float(ysize)); glVertex2f(1.0f, 1.0f);
glTexCoord2f(0.0f, float(ysize)); glVertex2f(-1.0f, 1.0f);
end = rdtsc();
time = time + (end - start);
Doing this I am getting some (perhaps unbelievable) fast rendering times - is there any other considerations that I need to take into account?
I am not rendering to texture - and I ensure the window is fully visable on the screen during execution (to prevent clipping.)
Thank you for your help in advance.
How fast times? On what GPU? How big is that quad (approximate number of pixels on screen).
Basicaly your code looks ok - you did use the glFinish after test. I would also put glFinish command before start = rdtsc(); line to ensure no pending OpenGL operatoin will cause longer times.
I can see that you add (end - start) to time. I assume this is part of some loops that repeats this test a few times. Do you have your time variable initialized properly and are start, end and time variables of proper type to use with rtdsc?
Also try glGetError to check if your code executed properly.
It makes no difference if your window is visible if you render to backbuffer.
Perhaps you have your modelview/projection matrices set up incorrectly or you have culling / clipping enabled - do you actually see that quad when you swap buffers?
If you’re on Mac OS X then you can use the OpenGL Profiling tool. Sorry if you’re not.
Thank you both for your replies. The performance I am getting is 2.3ns for a quad of size 1024*1024 (about 400-500MP/s). (For a 3x3 mask filter.) This is on a GeForce 6800 GT GPU.
You are correct I loop the rendering 50 times to get an average value for throughput time. Before the loop time is initialised to 0.
I will try double buffering today as you suggest and will check the other parameters you mention.
Oh to answer the other post - I am implementing this on a windows platform.
Thank you again for your assistance.