GLBench

I decided to move my reply from yesterday to a new thread so as not to hijack someone elses question…

I’ve uploaded the new version of GLBench with source.
http://www.adrian.lark.btinternet.co.uk/GLBench.htm

It also now benchmarks vertex transform, fillrate and readpixels+pdr.

Someone sent me reaults for the 9800 pro which had some numbers that at first looked wrong.

ZBuffer clear speed of 100Gb/sec? http://www.adrian.lark.btinternet.co.uk/GLBench6.htm

Now I realise this is probably ATI’s fast zclear in action.

The only numbers that still look wrong are the transform speed for the cube, 4M Vertices/second is very slow and its odd every test has the same speed. Must be a bug in my app I guess but I can’t spot it. The code in question is in the TNLCube function.

[This message has been edited by Adrian (edited 02-16-2004).]

I think I have got to the bottom of the problem.

I was drawing the cube close to the viewer. This shouldnt have mattered since I had cullface front and back enabled. On my geforce there was no fillrate issue but on the ati it seems like there is some fillrate overhead. It would explain why the results for immediate,vertex array and display list were identical and low.

The mesh transform test wasnt affected because that was already positioned well away from the viewpoint.

I have made the change and updated the file. Can someone with an ATI try it out? Thanks.

I’ve been trying to get this to build on a linux-based system, but I’m stuck with getting the extension stuff working. I must confess I haven’t messed around very much with extensions, so I don’t really know what I’m doing.

In glx.h, I have:

extern void *glXAllocateMemoryNV(GLsizei size, GLfloat readfreq, GLfloat writefreq, GLfloat priority);

But I don’t have a corresponding:
PFNGLXALLOCATEMEMORYNVPROC
defined anywhere (that I can tell)

(whereas I do have declarations for both:
glFlushVertexArrayRangeNV -and-
PFNGLFLUSHVERTEXARRAYRANGENVPROC)

So the normal (?) glXGetProcAddressARB business doesn’t work… I tried to create my own typedef for PFNGLXALLOCATEMEMORYNVPROC

…which I figure I shouldn’t be doing anyway – and I couldn’t make it work. Again I will confess that I don’t exactly know what is supposed to be going on here.

I get an error such as:
cannot convert void (*) (int, float, float, float) to void *() (int, float, float, float) in assignment

Maybe I don’t even need to be doing this glXGetProcAddress call?

-Steve
(Not trying to hijack your thread )

Originally posted by Stephen Webb:
[b]I’ve been trying to get this to build on a linux-based system, but I’m stuck with getting the extension stuff working. I must confess I haven’t messed around very much with extensions, so I don’t really know what I’m doing.

In glx.h, I have:

extern void *glXAllocateMemoryNV(GLsizei size, GLfloat readfreq, GLfloat writefreq, GLfloat priority);

But I don’t have a corresponding:
PFNGLXALLOCATEMEMORYNVPROC
defined anywhere (that I can tell)

(whereas I do have declarations for both:
glFlushVertexArrayRangeNV -and-
PFNGLFLUSHVERTEXARRAYRANGENVPROC)

So the normal (?) glXGetProcAddressARB business doesn’t work… I tried to create my own typedef for PFNGLXALLOCATEMEMORYNVPROC

…which I figure I shouldn’t be doing anyway – and I couldn’t make it work. Again I will confess that I don’t exactly know what is supposed to be going on here.

I get an error such as:
cannot convert void (*) (int, float, float, float) to void *() (int, float, float, float) in assignment

Maybe I don’t even need to be doing this glXGetProcAddress call?

-Steve
(Not trying to hijack your thread )

[/b]

If you don’t know how to use extensions, check http://glew.sf.net or some other extension loading lib or some tutorial…

I’m not that great with extensions myself but I put all the extension headers that I used into a zip which might help you. http://www.adrian.lark.btinternet.co.uk/glext.zip

Originally posted by Adrian:
[b]I’m not that great with extensions myself but I put all the extension…

I appreciate the help. I just got done compiling and running GLBench 1.1-linux…

My results can be found here: http://web.qx.net/lizjones/steve/GLBench_results

If anyone else wants to run to run this on their linux system, I’ll roll up what I have and let you have a go at it (subject to Adrian’s approval, of course)…

I’m too tired to try to use the code tonight, but I can’t wait to get my glTexSubimage2d to run this fast:

TEXIMAGE TESTS (MB/sec)
glTexImage2D 2246 MB/sec
glTexSubImage2d 2320 MB/sec
glCopyTexImage2D 3571 MB/sec
glCopyTexSubImage2D 3486 MB/sec

Somehow I suspect something isn’t quite right here. I won’t be too surprised if I drop back to the 133 MB/sec range I have been seeing lately. But I can dream, can’t I?

I’m not exactly sure what is going on “under the covers” but I was able to achieve 500-800 MB/sec on my system (w/ no special memory allocation tricks, etc) – but as soon as the texture was actually being used, I droped to about 130 MB/sec.

The only change in the code was the addition of this line:

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

before -> no texture rendered -> 600+ MB/sec
after ->texture rendered correctly -> 130 MB/sec

Maybe this part of the discussion belongs back in the other thread…

-Steve
Adrian – once again, I want to thank you for making this program available to me, as it has been very helpful both for performance evaluation and code examples.

I just upgraded my drivers to 56.55 and they seem to be faster with glTexSubImage2d. I am now getting a speed of 330Mb/sec instead of 270Mb/sec.

The figures you are getting for teximage of 2000mb/sec definately look wrong to me.

This test doesn’t seem to be very trustworthy. I can only wonder about the glMultMatrix time in Stephen’s report. In addition all the glEnable/glDisable tests probably measure the driver’s states management code on the CPU, nothing else. Just enabling some states, textures or whatever and not using it will definately lead to only measuring the CPU work done by the driver…

Y.

More results, if anyone is interested:

1024x1024 texture copies:
TEXIMAGE TESTS (MB/sec)
glTexImage2D 617 MB/sec
glTexSubImage2d 608 MB/sec

512 x 512 texture copies:
TEXIMAGE TESTS (MB/sec)
glTexImage2D 877 MB/sec
glTexSubImage2d 849 MB/sec

Those numbers seem a whole lot more down to earth (though still high, for me) as compared to the 256x256 test. Any ideas what’s going on?

256 x 256 texture copies:
TEXIMAGE TESTS (MB/sec)
glTexImage2D 2312 MB/sec
glTexSubImage2d 2289 MB/sec

-Steve

Originally posted by Stephen Webb:
If anyone else wants to run to run this on their linux system, I’ll roll up what I have and let you have a go at it (subject to Adrian’s approval, of course)…

Yes that’s fine.

Originally posted by Ysaneya:
This test doesn’t seem to be very trustworthy. I can only wonder about the glMultMatrix time in Stephen’s report. In addition all the glEnable/glDisable tests probably measure the driver’s states management code on the CPU, nothing else

Yes its wierd but that’s why I made the benchmark public, to identify bugs that didnt show up on my system. Actually I cant see a bug that could be causing that.

I know the enable/disable tests are only measuring the cpu, the majority of the overhead in state changes seem to be on the cpu anyway. It is better to have some idea than no idea imo.

One thing I noticed that was odd about PDR glReadPixels was that the download rate dependent on the data. The way I got more sensical data was:

a) load a bitmap from disk
b) draw the image
c) benchmark readpixels using the image
d) draw the image

-Won

Originally posted by Adrian:
I know the enable/disable tests are only measuring the cpu, the majority of the overhead in state changes seem to be on the cpu anyway.
You’re measuring nothing, really. You’re specifically not measuring any of the state change overhead (CPU or not) that will occur in a real renderer.
You need to learn more about state machine design.

What this glEnable/glDisable benchmark measures is essentially something along these lines:
1)function pointer dereference and call
2)argument validation
3)store arguments to state structure
4)bitwise OR some flag (state dirty)
5)return

But it doesn’t actually do anything until the next glBegin or equivalent. Step through there with a debugger if you’re so inclined. You’ll see.

Ok I might add a glbegin/glend in the loop and then subtract the cost of the begin/end. Either that or remove the state tests entirely.

I think if you want to test the performance of a state, you need to measure the time it takes to draw a scene with this state enabled… and not just a single triangle. Do not forget a couple of well placed glFinish() to be sure your measures are correct. When you enable the alpha test for example (all other states being the same), are you interested in the cost of enabling/disabling the state, or the cost of rendering your scene with alpha test enabled, as opposed as to not using it at all ? Personnally that’s the later.

Y.

I’m just interested in the enable/disable cost. I think a single glbegin/end call will be fine.