WTF! I turn FSAA to 8x from 2xQ and I get 76fps. With it at 2xQ I get almost 10 less. 2xQ is supposed to be WAY faster (with less smoothing of course) right? I also get 76 to 80 with FSAA off. Something is REALLY weird.
EDIT: Ok that is the case for some reason in that VBO earth app. In Quake 3 I lost about 150fps from 2xQ to 8x FSAA.
-SirKnight
[This message has been edited by SirKnight (edited 11-03-2003).]
Originally posted by Csiki:
[b] I had the same problem with GeforceTi4200.
It seems that the new 52.16 driver do same cruel optimization with the static data, but have a lot of problem with it if it’s changing frequently (one change in all frame)…
Use DYNAMIC_DRAW instead. Unfortunately this seems to be a simple vertex array implementation…
[/b]
I NEVER change the vertex data once initially loaded, I only change vertex program parameters (shader constants).
In my own experience Radeons
are beating GeforceFX using
VBO with separate static
arrays, running 3.8 against 52.16.
I put all my data into VBO memory
and just call draw elements with
buffer offsets. The speed up over
system resident arrays is huge
on the Radeons (500%), and much
smaller with the GeforceFX (~100%).
Two figures I can find at the
moment ( I’ve a lot more but
they are on my other PC).
GeforceFX 5600 - 25.2 MTri/s
Radeon 9800 Pro - 183 MTri/s
5900 is ~ 60 MTri/s IIRC.
Those figures are from
high spec 3GHz+ P4s
running XP Pro.
To the best of my knowledge
I’m not doing anything daft,
as I’ve followed the spec
closely, and tried to account
for all other variations.
VBO appears to work well
on ATI, but need fixes
or optimisations on Nvidia.
Ok I just ran the program again without changing anything and it goes from 71 to 135 fps. Strange how VBO all of a sudden makes a difference when a while ago it didn’t. Still though, this seems like it should be faster.
I should also add, with my 44FPS vs 8FPS(VBO) benchmark everything is batched reasonably. I would estimate 7 tristripped batches per frame with about 3200 indices per batch.
First: Run VTune (or another sampling profiler) on your system while the program is running, both with VBO and without it. You’ll probably find a BIG spike somewhere in the VBO case. This is where you’re spending all your time. Look at the code (may need disassembling) – what is it trying to do? Packing/unpacking values? Copying data? Calculating min/max? Whatever it’s doing, figure out what part of the OpenGL API woudl need that performed, and make it not necessary by adjusting how you call it.
For example, if it’s un-packing, say, signed bytes to floating-point (and this is just as a wild example), and you’re passing normals in as signed bytes, then you can draw the conclusion that this is not a supported data format, and you’re better off passing normals as float.
Second issue: if you get the “VPU Recover” alerts, then it’s very likely that your motherboard, memory bus, or AGP bus is not quite up to spec, and there’s either a chipset bug, or a signal quality problem. Raising voltages a little bit may help if it’s the latter; if it’s the former, get a better mobo
Originally posted by jwatte:
Second issue: if you get the “VPU Recover” alerts, then it’s very likely that your motherboard, memory bus, or AGP bus is not quite up to spec, and there’s either a chipset bug, or a signal quality problem. Raising voltages a little bit may help if it’s the latter; if it’s the former, get a better mobo
Well, that could be, then again, I think it is driver bugs, or very picky timing for cat drivers, since those same programs I tried work just fine with my old GF2 card.
Just curious what is a ‘better’ mobo? I mean, what do you consider good?
So shorts and bytes are opposites as far as normalization is concerned.
here is an excerpt from an email from NVIDIA
> At first blush, my guess is that in the VBO case, the use of
> an attrib array as UNSIGNED_SHORT is causing us to fall back
> to non-pulling paths; we can’t do vertex pulling with all
> types of data. I looked quickly and I can see that for
> generic attribs, SHORT, FLOAT, and HALF_FLOAT are supported
> for pulling unless the attrib is normalized and then
> UNSIGNED_BYTE is added and SHORT is removed. In fact, I don’t
> think we support UNSIGNED_SHORT vertex pulling in any circumstance.
>
> When we fall back to inline methods, if the VBOs are in AGP
> memory, we read the data from there - which is very slow.
> This is likely the reason why the performance drops so much.
> As for why it speeds up in non-VBO, the data is put in system
> memory so the read to place it in the pushbuffer is fast.
>
> If the user changed the UNSIGNED_SHORT to SHORT, FLOAT, or
> HALF_FLOAT, performance should be greatly improved.
So shorts and bytes are opposites as far as normalization is concerned.
here is an excerpt from an email from NVIDIA
mtm[/b][/QUOTE]
here is more useful info from NVIDIA.
For HALF_FLOAT usage, you should simply be able to pass in GL_HALF_FLOAT_NV
for the data type in place of GL_FLOAT. However, this will only be
accelerated on FX or better hardware - NV30 and beyond.
If you choose to use this data type, one issue you’ll need to keep in mind
is that you’ll need to format your data for use as half floats - you can’t
simply use the same 32bit float or 16bit short data. The spec defines the
format and conversion: http://oss.sgi.com/projects/ogl-sample/registry/NV/half_float.txt